JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi Guys, Following Up On This

Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey! Would It Be Possible To Tag The Rc Releases In The Different Repos? So That One Knows What Is Inside?

Hey! Would it be possible to tag the RC releases in the different repos? So that one knows what is inside?

clearml

5 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, How Does

Hi, how does agent.enable_git_ask_pass works? I am using the clearml-agent in docker mode and my experiment is stuck at downloading a private dependency: Clo...

mlops

2 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi Again, Is There A Way To Pass Secrets As Parameters Of A Task? I Have An Experiment That Requires Connecting To A Database, And I Need To Be Able To Pass The Creds As Task Params (Or In Another Way, I Don'T Know Yet). But I Don'T Want To Expose My Cred

Hi again, is there a way to pass secrets as parameters of a task? I have an experiment that requires connecting to a database, and I need to be able to pass ...

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

Hi, I cannot manage to start trains-server 0.16 with the docker-compose file, the trains-elastic container fails with the following error:

clearml

5 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

Hi! I have a question regarding performances of the clearml-server: are the calls from the agents made asynchronously/in a non blocking separate thread? is t...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hello, Pytorch 1.8 Was Released, Bringing Amd Wheels With It > Pip Install Torch -F

Hello, Pytorch 1.8 was released, bringing AMD wheels with it > pip install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html Is ClearML s...

clearml

4 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

trains-elastic | {"type": "server", "timestamp": "2020-12-07T15:19:11,101Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "c...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

Hi, I am using clearml with pytorch-ignite and its EarlyStopping handler. I would like to log the counter of the patience of this handler, how can I do that?

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hello, What Is The Default Limit For Global Context ?

Hello, what is the default limit for global context ? https://allegro.ai/docs/storage_manager_storagemanager.html#trains.storage.manager.StorageManager.get_l...

clearml

5 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Think There Is A Small Bug In The

Hi, I think there is a small bug in the Experiment running time column of the workers-and-queues/workers page: they do not match the time reported in the exp...

clearml

3 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, Would It Be Possible To Parse Torch Requirement When It’S Part Of The Extras_Require Dict? In My Code, I Have The Following:

Hi, would it be possible to parse torch requirement when it’s part of the extras_require dict? In my code, I have the following: train_task._update_requireme...

mlops

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey Again

Hey again 😁 I am migrating my trains-server to AWS and I would like now to have secure accounts (with password). But I don't want to loose the current users...

clearml

4 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hi, I Update Recently To Clearml-Server 1.2 (Self Hosted), Great Job! I Am Seeing The Popup Asking For S3 Creds Often When Navigating In Debug Samples. I Set Them Multiple Times Under Settings > Configuration > Web App Cloud Access, But For Some Reason It

Hi, I update recently to clearml-server 1.2 (self hosted), great job! I am seeing the popup asking for s3 creds often when navigating in debug samples. I set...

clearml

3 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

Hi there, I am running a clearml-agent in services mode (with docker) on a machine with two disks: one with the OS (8Go, 91% space used) and one for the data...

clearml

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi There, Any Plan/Benefit To Support Virtualenv= 20 ?

Hi there, any plan/benefit to support virtualenv= 20 ?

clearml

5 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

Hi, I would like to create backups of my trains-server periodically. I was thinking about creating a service task under the devops project. The backup task w...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi Guys, Is It Possible To Spin Up Two Agents On One Gpu? Something Like

hi guys, is it possible to spin up two agents on one GPU? Something like trains-agent daemon --gpus 0 --queue default & trains-agent daemon --gpus 0 --queue ...

clearml

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

First Link In Hyperparameter Optimization Page Is Broken >

First link in hyperparameter optimization page is broken > https://allegro.ai/docs/examples/examples_hyperparam_opt/

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi There, I Am Trying To Start An Agent In Services Mode With Trains-Server Being On Localhost (But Not Started Together With The Docker-Compose!). My Trains.Conf Is The Following:

Hi there, I am trying to start an agent in services mode with trains-server being on localhost (but not started together with the docker-compose!). My trains...

mlops

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The

Hi, is it possible to start a clearml-agent (not in docker mode) on a machine with a gpu, but enforce the clearml-agent to not “see” the gpu? So that the exp...

mlops

4 years ago

0 Votes

16 Answers

2K Views

0 Votes 16 Answers 2K Views

Hey, I Have A Problem With The Following Task:

Hey, I have a problem with the following task: def main(args): config = yaml.load(open(args.config)) if __name__ == '__main__': parser = argparse.ArgumentPar...

clearml

5 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

Hi all, how can I have a global variable used in a pipeline step? I have to define them in each pipeline step, otherwise they are not included in the pipelin...

clearml

one year ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Does Trains 0.16 Supports Pip >=20.2?

Does trains 0.16 supports pip >=20.2?

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

4 years ago

0 Votes

30 Answers

3K Views

0 Votes 30 Answers 3K Views

Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...

remote-ssh

3 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, In The Metric Snapshot Graph, Is It Possible To Scale The Y Axis To

Hi, in the Metric Snapshot graph, is it possible to scale the Y axis to [y_min *0.9, y_max * 1,1] ? currently all my values are flat at the same ~y and it is...

clearml

3 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, There Is A Small Bug With Auto-Refreshing In The Debug Samples Tab Of The Web Ui: If It Is On, Then It Will Always Force The First Series To Be Displayed, Regardless Of What Has Been Selected By The User. Eg: If I Have Two Metrics,

Hi, there is a small bug with auto-refreshing in the DEBUG SAMPLES Tab of the Web UI: If it is ON, then it will always force the first series to be displayed...

clearml

3 years ago

Show more results

0 Hi, Is It Possible To Pass Temporary Iam Role To The Web App Could Access?

Hi CostlyOstrich36 , I mean insert temporary access keys

3 years ago

0 Hi There,

Well no luck - using matplotlib.use('agg') in my training codebase doesn't solve the mem leak

2 years ago

0 Hi There,

This is what I get with mprof on this snippet above (I killed the program after the bar reaches 100%, otherwise it hangs trying to upload all the figures)

2 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

as for disk space: I have 21Gb available (8Gb used), /opt/trains/data folder is about 600Mo

5 years ago

I think the best case scenario would be that ClearML maintains a github action that sets up a dummy clearml-server, so that anyone can use it as a basis to run their tests, so that they just have to change to URL of the server to the local one executed in the github action and they can test seamlessly all their code, wdyt?

3 years ago

0 Hi, I Have Another Problem

agent.cuda_version = 110 agent.cudnn_version = 0

5 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

I execute the clearml-agent this way:
/home/machine/miniconda3/envs/py36/bin/python3 /home/machine/miniconda3/envs/py36/bin/clearml-agent daemon --services-mode --cpu-only --queue services --create-queue --log-level DEBUG --detached

4 years ago

0 Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

Hi CumbersomeCormorant74 yes, this is almost the scenario: I have a dozen of projects. In one of them, I have ~20 archived experiments, in different states (draft, failed, aborted, completed). I went to this archive, selected all of them and deleted them using the bulk delete operation. I had several failed delete popups. So I tried again with smaller bulks (like 5 experiments at a time) to localize the experiments at the origin of the error. I could delete most of them. At some point, all ...

4 years ago

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

I also tried setting ebs_device_name = "/dev/sdf" - didn't work

4 years ago

0 Hello, I Am Trying To Retrieve A Simple Dict Artifact Uploaded In A Previous Task With

It seems that around here, a Task that is created using init remotely in the main process gets its output_uri parameter ignored

5 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

But I see in the agent logs:
Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...

4 years ago

0 Hi, I Would Like To Follow-Up In This

As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)

That doesn’t work unfortunately

3 years ago

0 Hi, Another Idea For Clearml Web Ui: In The Projects View, If I Have Several Experiments Being Enqueued And I Sort By “Started” Ascending (Newest On Top), I Expect To See Enqueued Experiments At The Very Top, While They Are Shown At The Very Bottom - Woul

CostlyOstrich36 good enough, I will fallback to sorting by updated, thanks!

4 years ago

0 Hi, I Have Several Long Running Experiments Failing With

AgitatedDove14 After investigation, another program on the machine consumed all the memory available, most likely making the OS killing the agent/task

4 years ago

0 Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

DeterminedCrab71 This is the behaviour of holding shift while selecting in Gmail, if ClearML could reproduce this, that would be perfect!

3 years ago

0 Hi There,

If I manually call report_matplotlib_figure yes. If I don't (just create the figure), no mem leak

2 years ago

0 Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

that would work for pytorch and clearml yes, but what about my local package?

3 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Something like that?
` curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"match": {
"variant": "loss_model"
}
},
{
"match": {
"task": "8f88e4b8cff84f23bde74ed4b7213ec6"
}
}
]
}
},
"aggs": {
"series": {
"terms": { "field": "iter" }
}
}
}...

4 years ago

0 Hi, I Cannot Manage To Start Trains-Server 0.16 With The Docker-Compose File, The Trains-Elastic Container Fails With The Following Error:

trains-elastic container fails with the following error:

5 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

And so in the UI, in workers&queues tab, I see randomly one of the two experiments for the worker that is running both experiments

5 years ago

0 Hi, I Deleted All Archived Experiments In A Project And I Just Realized All Experiments Of All Projects Were Deleted (Clearml Server V1.0.0)

And now that I restarted the server and went back into the project where I initially deleted the archived experiments, some of them are still there - I will leave them alone, too scared to do anything now 😄

4 years ago

0 Hi, Although

What will this parameter do?

4 years ago

0 Hi, From Within An Experiment, How Can I Intercept The Signal That The Experiment Was Aborted And Execute A Cleanup Function? I Tried To Intercept Sigint And Sigterm, Unsuccessfully:

How exactly is the clearml-agent killing the task?

3 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"

4 years ago

0 Hi, In The Aws Autoscaler, I Am Getting The Following Warning:

ok yes, that makes sense 👍

4 years ago

0 We Can’T Add Overview To The Subprojects (Btw Thank You So Much For Subprojects, This Is Probably The Best Feature Ever Introduced To Trains/Clearml). Is It Intended? When I Click Overview For The Subproject, It Just Shows An Empty Page Without Any Button

For new projects it works 🙂

4 years ago

yes exactly

3 years ago

0 Hi Again, I Am Trying To Make The Aws Autoscaler Work With Ec2 Instances, But It Fails To Setup The Agent In The Machine: The Logs Of The User-Data Script Show That It Fails Updating The Machine (See Below)

I think waiting for the apt locks to be released with something like this would work
startup_bash_script = [ "#!/bin/bash", "while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done", "sudo apt-get update", ...Weirdly this throws an error in the autoscaler:
` Spinning new instance type=v100_spot
Error: Failed to start new instance, unexpected '{' in field...

4 years ago

0 Hi, I Have A Local Package That I Use To Train My Models. To Start Training, I Have A Script That Calls

Hi AgitatedDove14 , initially I was doing this, but then I realised that with the approach you suggest all the packages of the local environment also end up in the “installed packages”, while in reality I only need the dependencies of the local package. That’s why I use _update_requirements , with this approach only the package required will be installed in the agent

3 years ago

0 Hi, If I Am Starting My Training With The Following Command:

Yes 😞 😄

3 years ago

Show more results