AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi, I Am Getting Following Error While Trying To Checkout A Gut Hub Rep. Error: Rpc Failed; Curl 56 Gnutls Recv Error (-54): Error In The Pull Function. Fatal: The Remote End Hung Up Unexpectedly Fatal: Early Eof Fatal: Index-Pack Failed Repository Cloni

Okay, so you want to take the jupyter notebook (aka colab) and have that experiment show on Trains, then use the Trains UI to launch it remotely on one of the machines running the trains-agent. Is that correct?

4 years ago

0 Any Idea Why I Would Be Getting The Following Error When Running A Task In A Clearml-Agent? (Python 3.7.9, Package_Manager.Type = Conda)

Hmm, conda_freeze in the clearml.conf on the development machine ?

3 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

HealthyStarfish45 We are now working on improving the k8s glue (due to be finished next week) after that we can take a stab at slurm, it should be quite straight forward. Will you be able to help with a bit of testing (setting up a slurm cluster is always a bit of a hassle 🙂 )?

3 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

hmmm, somehow I have a bed feeling about it... Could you check the log, it should say something like "Collecting torch==1.6.0.dev20200421+cu101 from https://"
It should be right at the top of the installation. What do you have there?

4 years ago

0 I Am Wondering Is It Possible To Schedule A Task To Run At Certain Time In Periodic Fashion Aka. Cron Style... Thinking Of Having A Monitoring Task To Be Run Routinely ... I Could Use A Cron On One Of The Server But Prefer To Run It On Trains As Then I Am

Yes JitteryCoyote63 I think you are correct, this currently the easiest to do. PompousParrot44 notice that you should have a "services" queue with a trains-agent "services mode" running to enqueue those type pf mostly sleeping services 🙂
I was thinking we can quickly create a service that does that, maybe leverage one of these ?
https://github.com/mehrdadmhd/scheduler-py
https://github.com/dbader/schedule
WDYT?

4 years ago

0 Quick Question, Can Trains Log Keras Loss Values And/Or Metrics Automatically? Or Would I Have To Attach A Tensorboard Callback?

ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)

3 years ago

0 Hi Folks, I Did A Deployment Of Clearml Using The K8S Helm Chart, And I Set The Agent Using K8S Glue. I Run A Task Locally, And I Went To The Ui Cloned The Experiment And Scheduled It In The Default Queue. After Doing This, I See That The Experiment Is Q

Click on the "k8s_schedule" queue, then on the right hand side, you should see your Task, click on it, it will open the Task page. There click on the "Info" Tab, there look for "STATUS MESSAGE" and "STATUS REASON". What do you have there?

2 years ago

0 [Task Gets Interrupted / Aborted / Reset When In Offline Mode] For Local Testing, We Have Added A

Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we don’t need it?

It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?

one year ago

0 Clearml Server Deployment Uses Node Storage. If More Than One Node Is Labeled As App=Clearml, And You Redeploy Or Update Later, Then Clearml Server May Not Locate All Your Data.

yes, TrickySheep9 use the k8s glue from here:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py

3 years ago

0 Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?

Wait, are you using conda as package manager ?
EDIT: meaning configured in trains.conf as package manager

3 years ago

0 Hi All. I Am Using The Recently Added Trainslogger In Pytorch-Lightning And Experiencing Incoherent Behavior With Model Checkpoint Upload. I Made An Issue On Pytorch-Lightning Github

MelancholyBeetle72 thanks! I'll see if we could release an RC with a fix soon, for you to test :)

4 years ago

0 Hi I Have A Most Probably A Beginer Question Abour Loading The Data In Pycharm And Later On In Google Colab From An Dataset From Clearml. I Used From Page:

'

' error [Errno 13] Permission denied:

Seems like a permission issue ?
Try to remove your entire clearml cache folder None

9 months ago

0 When Using

SteadyFox10 could you try replacing the slash in the image name?

4 years ago

0 When Using

Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...

4 years ago

NICE!

3 years ago

0 Has Anyone Had Success Using Clearml With Huggingface Models? I Create My Hf

LOL I hear you 🙂

one year ago

0 Hi New With Clearml I Create Clearml Server On Gcp With Docker Now I’M Training Yolov5 And I Want To Save All The Info (Model And Metrics ) With Clearml To My Bucket.. (So I Can Have Small Server And No Memory Issue ) Where Should I Start? Its Should Be C

its should logged all in the end as I understand

Hmm let me check the code for a minute

one year ago

0 I Am Seeing Issue When Running A Script With Command As

Hi PompousParrot44
What do you have in the Execution/"script path" ?

3 years ago

0 Hi, I'M Using Clearml'S Hosted Free Saas Offering. I'M Running Model Training In Pytorch On A Server And Pushing Metrics To Cml. I'Ve Noticed That Anytime My Training Job Fails Due To Gpu Oom Issues, Cml Marks The Job As

Then we can figure out what can be changed so CML correctly registers process failures with Hydra

JumpyPig73 quick question, the state of the Task changes immediately when it crashes ? are you running it with an agent (that hydra triggers) ?

If this is vanilla clearml with Hydra runners, what I suspect happens is Hydra is overriding the signal callback hydra adds (like hydra clearml needs to figure out of the process crashed), then what happens is that clearml's callback is never cal...

2 years ago

0 Hello! I Have A Problem With Tutorial Client Code Crashes On Starting Pipelines Remotely Via

Hi FancyWhale93
pipe.start() should actually stop the local pipeline logic execution and fire it on the "services queue".
The idea is that you can launch the pipeline locally, but the actual execution of the entire logic is remote.
You can have the pipeline running locally if you call pipe.start_locally or also run the steps locally (as sub processes) with pipe.start_locally(run_pipeline_steps_locally=False)
BTW: based on your example, a more intuitive code might be the pi...

2 years ago

0 Hi! I Was Wondering Why Clearml Recognize Scikit-Learn Scalers As Input Models... Am I Missing Something Here? For Me It Would Make Sense To Include The Scalers As A Configuration Object Of The Trained Model, Not Outside

That is a good question ... let me check 🙂

2 years ago

0 Hello. I Have A Very Basic Question. I'M Still Exploring Clearml To See If It Fits Our Needs. I Have Taken A Look At The Webui, And I Am Confused About What Constitutes A Project. It Seems That A Project Is Composed By A Series Of Experiments And Models,

Hi ShinyWhale52
This is just a suggestion, but this is what I would do:

use clearml-data and create a dataset from the local CSV file
clearml-data create ... clearml-data sync --folder (where the csv file is)2. Write a python code that takes the csv file from the dataset and creates a new dataset of the preprocessed data
` from clearml import Dataset

original_csv_folder = Dataset.get(dataset_id=args.dataset).get_local_copy()

process csv file -> generate a new csv

preproces...

3 years ago

0 Looking At The Docs.. I Couldn'T Find A Way To Cleanup The Experiments... Only Archive Them... I Also Noticed

PompousParrot44 obviously you can just archive a task and run the cleanup service, it will actually delete archived tasks older than X days.
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py

4 years ago

0 Currently Clearml-Agent In Services-Mode Supports Cpu Only Configuration.

The reasoning is that most likely simultaneous processes will fail on GPU due to memory limit

3 years ago

0 Hey, I Have A Question Regarding “Logger.Report_Table”, It’S Seems Like After The Table Is Drawn In The Ui I Cannot Change The Column Size And More Annoying I Cannot Select Content From Table To Copy It. Anyone Know What Params I Need To Pass In Order To

any idea why i cannot selected text inside the table?

Ichh, seems again like plotly 😞 I have to admit quite annoying to me as well ... I would vote here: None

one year ago

0 I'Ve Been Working A Bit With Trains-Agent, Having Them Deployed On Different Machines Listening To Queues (Docker Mode) And It'S Been Working Good So Far. My Question Is What Is The Difference Between That Setup (Creating Agents On Different Machines And

It's just another flag when running the trains-agent
You can have multiple service-mode instances, there is no actual limit 🙂

3 years ago

Hi JumpyPig73 , I think it was synced to github. You can already test with: git install git+ https://github.com/allegroai/clearml.git

2 years ago

0 Hi, When Using

ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...) which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...

2 years ago

0 Hi, I Noticed That Clearml Does Not Work Together With The Debugger In Pycharm. Everytime I Use The Debugger I Have To First Comment Out The Clearml Code. Is It Possible To Solve This?

GreasyPenguin14 could you test with the 0.17.5rc4 ?
Also what's the PyCharm / OS?

3 years ago

0 Hi, I Noticed That Clearml Does Not Work Together With The Debugger In Pycharm. Everytime I Use The Debugger I Have To First Comment Out The Clearml Code. Is It Possible To Solve This?

pip install clearml==0.17.5rc4

3 years ago

Show more results