JitteryCoyote63

214 Questions, 1021 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

979 × Eureka!

Questions 214
Answers 1021

0 Votes

4 Answers

967 Views

0 Votes 4 Answers 967 Views

Hey There, Is There A Way To Access The Trains Configuration Programmatically At Runtime In A Task (The Configuration That Is Dumped By The Agent In The Logs Before Executing A Task)

Hey there, is there a way to access the trains configuration programmatically at runtime in a task (the configuration that is dumped by the agent in the logs...

mlops

4 years ago

0 Votes

1 Answers

913 Views

0 Votes 1 Answers 913 Views

Hi There, Would It Be Possible To Add Some Neural Architecture Search Example, As For The Hyperparameter Optimizer Examples?

Hi there, would it be possible to add some Neural Architecture Search example, as for the HyperParameter Optimizer examples?

clearml

3 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hi, Some Properties Of The Task Object Are Not Listed In The Documentation (Such As Task.Parent, Which Is Not Clear Whether It Is The Parent Task Object Itself Or The Id Of The Parent Task).

Hi, some properties of the Task object are not listed in the documentation (such as task.parent, which is not clear whether it is the parent task object itse...

clearml

4 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:...

pytorch

4 years ago

0 Votes

18 Answers

975 Views

0 Votes 18 Answers 975 Views

Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...

mlops

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

3 years ago

0 Votes

5 Answers

936 Views

0 Votes 5 Answers 936 Views

Hi, I Have A Long Running Experiment That Was Running On Aws Instance That Got Killed After ~4 Days With The Following Reason:

Hi, I have a long running experiment that was running on AWS instance that got killed after ~4 days with the following reason: STATUS REASON: Forced stop (no...

clearml

2 years ago

0 Votes

19 Answers

1K Views

0 Votes 19 Answers 1K Views

Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

Hi again, I am trying to make the aws autoscaler work with ec2 instances, but it fails to setup the agent in the machine: the logs of the user-data script sh...

aws mlops

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, A Small Bug (Not Really A Bug) In The Autoscaler: I Have P3.2Xlarge Instances That Take A Long Time To Shutdown. With

Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...

mlops

3 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hey, Just Wanted To Mention: In Docs, Task.Get_Parameter Does Not Say:

Hey, just wanted to mention: in docs, Task.get_parameter does not say: Different sections with key prefix "section/" , as Task.get_parameters do. Also there ...

clearml

4 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, Where Can I Find The Logs Of Trains-Agent By Default?

Hi, where can I find the logs of trains-agent by default?

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi There, Would It Be Possible For The Autoscaler To Support Stopping Instances Instead Of Terminating Them? My Use Case Is The Following: I Am Continuing My Journey With The Clearml-Session Tool, And In Case The Clearml-Session Is Running In A Ec2 Inst

Hi there, would it be possible for the autoscaler to support stopping instances instead of terminating them? My use case is the following: I am continuing my...

mlops remote-ssh

2 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi There, I Am Trying To Start An Agent In Services Mode With Trains-Server Being On Localhost (But Not Started Together With The Docker-Compose!). My Trains.Conf Is The Following:

Hi there, I am trying to start an agent in services mode with trains-server being on localhost (but not started together with the docker-compose!). My trains...

mlops

4 years ago

0 Votes

4 Answers

955 Views

0 Votes 4 Answers 955 Views

Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

Hi, in the Metric Snapshot section of the Overview tab of a project page, would it be possible to: Show running experiments Have the legend clickable, to hid...

clearml

2 years ago

0 Votes

3 Answers

976 Views

0 Votes 3 Answers 976 Views

Hi Quick Question: Does Task.Connect_Configuration Support Omegaconf Dictconfig Objects? Ie. Can I Do:

Hi quick question: does Task.connect_configuration support OmegaConf DictConfig objects? ie. Can I do: config = train_task.connect_configuration(OmegaConf.lo...

clearml

2 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

Hi, in a subproject, would it be possible to hide the parent project if it is empty?

clearml

3 years ago

0 Votes

7 Answers

951 Views

0 Votes 7 Answers 951 Views

Hi, Is There A Way To Get Some Stats About The Use Of Workers? I Would Like To Know, Over The Past 3 Months:

Hi, is there a way to get some stats about the use of workers? I would like to know, over the past 3 months: Number of training hours per user Number of trai...

clearml

3 years ago

0 Votes

10 Answers

1K Views

0 Votes 10 Answers 1K Views

Hi Guys, Any Plan To Integrate The

Hi guys, any plan to integrate the https://github.com/allegroai/trains-agent/blob/master/examples/dynamic_cloud_cluster.ipynb in trains-server? The code ther...

clearml

4 years ago

0 Votes

5 Answers

934 Views

0 Votes 5 Answers 934 Views

How Can I Do The Following? (Basically, Filtering By Task Type)

How can I do the following? (basically, filtering by task type) Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type="trainin...

clearml

4 years ago

0 Votes

2 Answers

958 Views

0 Votes 2 Answers 958 Views

Hello, What Is The Default Limit For Global Context ?

Hello, what is the default limit for global context ? https://allegro.ai/docs/storage_manager_storagemanager.html#trains.storage.manager.StorageManager.get_l...

clearml

4 years ago

0 Votes

19 Answers

1K Views

0 Votes 19 Answers 1K Views

I Guess One Experiment Is Running Backwards In Time

I guess one experiment is running backwards in time 😄

clearml

2 years ago

0 Votes

7 Answers

977 Views

0 Votes 7 Answers 977 Views

Hi, I Think There Is A Small Bug In The

Hi, I think there is a small bug in the Experiment running time column of the workers-and-queues/workers page: they do not match the time reported in the exp...

clearml

3 years ago

0 Votes

5 Answers

984 Views

0 Votes 5 Answers 984 Views

Hey There, Since Which Version, Clearml Stops Connecting To The Demo Server By Default?

Hey there, since which version, clearml stops connecting to the demo server by default?

clearml

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi, Is It Possible To Specify The Required Version Of Python For A Task That Is Different From The Python Running The Clearml-Agent? Example: My Clearml-Agent Is Running On Python 3.8 And I Need A Task To Run On Python 3.10. How Can I Do That?

Hi, is it possible to specify the required version of python for a Task that is different from the python running the clearml-agent? Example: my clearml-agen...

clearml

2 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi There, Is It Possible To Configure The Clearml-Agent To Run Some Commands Before Running Each Experiment It Launches? Eg.

Hi there, is it possible to configure the clearml-agent to run some commands before running each experiment it launches? Eg. echo "test" > "test.txt" && <-- ...

clearml

3 years ago

0 Votes

5 Answers

1K Views

0 Votes 5 Answers 1K Views

Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Hi again, it seems like the aws autoscaler is not spinning instances with the EBS configuration I configured. Here is the configuration: resource_configurati...

aws mlops

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Hello, I would like to use spot instances together with the AWS autoscaler to train models with pytorch/ignite and I am wondering how to support interruption...

mlops

3 years ago

0 Votes

1 Answers

967 Views

0 Votes 1 Answers 967 Views

Hi, Would It Be Possible To Parse Torch Requirement When It’S Part Of The Extras_Require Dict? In My Code, I Have The Following:

Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...

mlops

3 years ago

0 Votes

30 Answers

1K Views

0 Votes 30 Answers 1K Views

Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...

remote-ssh

2 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, Is It Still True That --Services-Mode Only Supports Docker Mode?

Hi, Is it still true that --services-mode only supports docker mode?

clearml

3 years ago

Show more results

0 Hi, I Am Considering Making Automated Backups Of My Clearml-Server Using Amazon Ebs Snapshots. Should I Be Concerned With The Same Problem Described Here >

I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?

3 years ago

0 Hello There, Is There A Parameter To Configure The Number Of Columns Rendered In The Preview Area Of The Csv Artifacts? (Some Of Them Are Truncated With “…”)

Nice, the preview param will do 🙂 btw, I love the new docs layout!

3 years ago

0 Hi, I Encounter A Weird Behavior: I Have A Task A That Schedules A Task B. Task B Is Executed On An Agent, But With An Old Commit

The task is created using Task.clone() yes

4 years ago

0 Hi There

Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300

4 years ago

0 Hi There

basically:
` from trains import Task

task = Task.init("test", "test", "controller")
task.upload_artifact("test-artifact", dict(foo="bar"))
cloned_task = Task.clone(task, name="test", parent=task.task_id)
cloned_task.data.script.entry_point = "test_task_b.py"
cloned_task._update_script(cloned_task.data.script)
cloned_task.set_parameters(**{"artifact_name": "test-artifact"})
Task.enqueue(cloned_task, queue_name="default") `

4 years ago

0 Hi There

Here is the minimal reproducable example.
Run test_task_a.py - It will register a dummy artifact, create a new task, set a parameter in that task and enqueue it test_task_b will try to retrieve parameter from parent task and fail

4 years ago

0 Hi There

Yes this is correct. I am trying to create a minimal reproducable example

4 years ago

0 Hi There

AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...

4 years ago

0 Hi There

What is weird is:
Executing the task from an agent: task.get_parameters() returns an empty dict Calling task.get_parameters() from a local standalone script returns the correct properties, as shown in web UI, even if I updated them in UI.So I guess the problem comes from trains-agent?

4 years ago

0 Hi There

Thanks for your inputs, I will try that! For completion, here is how I retrieve the parameters:
` from trains import Task

task = Task.init("test", "test")
parent_task = Task.get_task(task.parent)
task.get_logger().report_text(task.get_parameters())
artifact_name = task.get_parameter("General/artifact_name")
artifact = parent_task.artifacts[artifact_name].get() `

4 years ago

0 Hi There

So in my minimal reproducable example, it does work 🤣 very frustrating, I will continue searching for that nasty bug

4 years ago

0 Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

yes

8 months ago

0 Hi Guys, Last Night One Of Our Agents (0.16.1) Was Disconnected From Our Trains-Server While Executing An Experiment. I Saw That Because The Experiment It Was Running Had The Status Aborted And I Could Not See The Agent In The List Of Available Workers. H

very cool, good to know, thanks SuccessfulKoala55 🙂

4 years ago

0 Hi, I Have A Configuration File That I Read And Connect To My Training Tasks. I Cannot Use

Hi SuccessfulKoala55 , super that’s what I was looking for

3 years ago

0 Hi, I Am Trying To Update The Aws_Autoscaler To The Latest Version On The Master Branch. I Simply Changed The Commit Id In The Experiment And Run It, This Gave Me The Following Error:

Indeed, I actually had the old configuration that was not JSON - I converted to json, now works 🙂

3 years ago

0 Hi, How Can I Get The Logs From The Pytorch Ignite Early Stopping Handler To Be Logged In Clearml?

AgitatedDove14 yes but I don't see in the docs how to attach it to the logger of the earlystopping handler

3 years ago

0 Hi, On Clearml-Server 1.5.0, In Scalar Graphs, The New Default Value Is “Show Closest Data On Hover”. Would It Be Possible To Make It Automatically Set To “Compare Data On Hover” When Comparing Multiple Experiments?

Very nice! Maybe we could have this option as a toggle setting in the user profile page, so that by default we keep the current behaviour, and users like me can change it 😄 wdyt?

2 years ago

0 Hi, I Would Like To Bring Awareness

and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working

one year ago

0 Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

no it doesn't! 3. They select any point that is an improvement over time

2 years ago

0 Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

Thanks!3. I don't know, I never used Highcharts 🙂

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

I am not using hydra, I am reading the conf with:
config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

But I am not sure it will connect the parameters properly, I will check now

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Doing it the other way around works:
` cfg = OmegaConf.create(read_yaml(conf_yaml_path))
config = task.connect(cfg)
type(config)

<class 'omegaconf.dictconfig.DictConfig'> `

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict() ?
Why not task.connect_configuration(read_yaml(conf_path)) simply?
I mean what is the benefit of returning ProxyDictPostWrite instead of a dict?

2 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

Same, it also returns a ProxyDictPostWrite , which is not supported by OmegaConf.create

2 years ago

0 Hi, In A Subproject, Would It Be Possible To Hide The Parent Project If It Is Empty?

I mean, inside a parent, do not show the project [parent] if there is nothing inside

3 years ago

0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

correct, you could also use

Task.create

that creates a Task but does not do any automagic.

Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."

4 years ago

Because it lives behind a VPN and github workers don’t have access to it

2 years ago

0 Hey There, Does Trains Support

No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?

4 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Some more context: the second experiment finished and now, in the UI, in workers&queues tab, I see randomly
trains-agent-1 | - | - | - | ... (refresh page) trains-agent-1 | long-experiment | 12h | 72000 |

4 years ago

Show more results