AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

I'm not sure how to debug it, that would be my first question. So I should first check if docker is executed with --gpus? I'll pay attention to this next time this happens, thanks.

The first line of the Task console log should have the exact docker command that was used, this could be a good start
also check if there is any chance there is another agent listening to this queue, maybe it actually runs somewhere without a gpu at all?

one year ago

0 Hi! I Have A Gpu Workstation At The Office (No Public Ip) With Latest Clearml-Agent Installed. When I Was In The Same Network - I Was Able To Use Clearml-Session From My Laptop. Now I Work From Home, And Clearml-Session Fails With

Oh in that case add --remote-gateway <external_ip> It will connect to the provided address instead of the local one. (you can also add --public-ip which will automatically resolve the public IP of the server

4 years ago

0 Is There A Way How I Can Get How Many Minutes The Gpu Has Been Used In A Month? The Duration Of An Iteration Is For Every Run Different If You Vary Batch Size. Model, Or Other Stuff. I Want To Do A Crude Energy Consumption Calculation By Doing A Sum Over

Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?

4 years ago

0 Hi All, I Am Getting A Bunch Of This Kind Of Log Messages "Clearml.Storage - Info - Starting Upload: /Tmp/.Clearml.Upload_Model_6Ou50Pb1.Tmp =>" I Am Pretty Sure They Happen As A Part Of The Model Initialization About 10 Of Those, My Guess Is That Every T

You can see the class here:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/clearml/binding/frameworks/init.py#L52

Basically you do:
` def my_callback(load_or_save, model):
# type: (str, WeightsFileHandler.ModelInfo) -> WeightsFileHandler.ModelInfo
assert load_or_save not in ('load', 'save')
# do something
if skip:
return None
return model

WeightsFileHandler.add_pre_callback(my_callback) `

4 years ago

0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

GrievingTurkey78 Actually it is in progress, see the GitHub issue for details:
https://github.com/allegroai/trains/issues/219

4 years ago

0 Hi, I Assume It Is Very Basic But How Can I Add The Model That Is Created In The Training To The Artifacts And To See It In The Models Tab?

Your code should have worked, i.e. you should see the 'model.h5' in the artifacts tab. What do you have there?
It should look something like this one:
https://demoapp.trains.allegro.ai/projects/531785e122644ca5b85b2e19b0321def/experiments/e185cf31b2634e95abc7f9fbdef60e0f/artifacts/output-model

BTW:
To manually register any model:

from trains import Task, OutputModel task = Task.init('examples', 'my model') OutputModel().update_weights('my_best_model.h5')

5 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

HealthyStarfish45 my apologies, they do have it (this ability needs support for both trains-agent and server) but not in the open-source ...

4 years ago

0 Hi, I Have Another Problem

Hi JitteryCoyote63
What do you have in the agent.cuda_version ?
(you can see it printed at the beginning of the log)

5 years ago

0 Hey Trains Riders, This Must Be Something Simple I Am Missing, But Still I Couldn'T Realize What The Problem Is. I Am Trying To Run Trains-Agent On My Experiments. Setup Of The Server And The Agent Is Fine, But I Am Struggling To Run Real Experiments (Not

Hi ColossalDeer61 ,
the next trains-agent RC (solving the #196 issue) will also solve the double install issue 🙂

5 years ago

0 Hi, I'M Trying To Get Tensorboard Plots Into The Allegro Trains Server. Although I Followed The Example

Hi TrickyRaccoon92 , TB is automatically collected and converted into data stored on the system The UI uses plotly to display the data itself (on your web browser).
You still have the original TB protobuf file, if you want to dive deeper and debug the data (it is not automatically uploaded, but some users do upload it as additional artifact on the experiment)
Make sense ?

4 years ago

0 Quick Question: How Can I Clone A Task And Change The Cloned Task Type? I See No Task.Set_Type() Function

t = Task.get_task('aabbcc') t.update_task(task_data={'task_type': "optimizer"})

4 years ago

0 Hi Everyone, I Am Running A Pipeline Using The Autoscaler, I Am Able To Spin Up The Vm Instance Using The Autoscaler And The Docker Is Also Getting Installed In There Perfectly. The Issue I Am Facing Is That During Executing A Pipeline Task While Cloning

on the host machine or inside the containers that are spinning on the host machine ?

one year ago

0 Hi Guys, Firstly, Thank You For Your Efforts And Your Support. I'M Trying To Use Allegro Trains To Handle The Experiments Of A Git Repo. The Repo Is Structured As Follows:

Firstly, thank you for your efforts and your support.

Thanks SmugOx94 !

Are you running trains-agent in docker mode? The aforementioned scripts are executed before, the experiment is being cloned, they are meant to be a part of the docker setup, not a per experiment script.
You could try to edit the experiment and have:
Working Directory: "."
(that means the root of the repository)

Script Path: "experiments_that_uses_library/train.py"

This will make sure you can do "import l...

4 years ago

0 Question About The Storage Manager. Assuming I Have An Object That Updates Frequently And Always Saved At The Same Path (E.G.

Well I guess you can say this is definitely not self explanatory line 😉
but, it is actually asking whether we should extract the code, think of it as:
if extract_archive and cached_file: return cls._extract_to_cache(cached_file, name)

4 years ago

0 Hi, I Am Getting Following Error While Trying To Checkout A Gut Hub Rep. Error: Rpc Failed; Curl 56 Gnutls Recv Error (-54): Error In The Pull Function. Fatal: The Remote End Hung Up Unexpectedly Fatal: Early Eof Fatal: Index-Pack Failed Repository Cloni

Hmm, you are missing the entry point in the execution (script path).
Also as I mentioned you can either have a git repo or script in the uncommitted changes, but not both (if you have a git repo then the uncommitted changes are the git diff)

5 years ago

0 Getting This Error At

You cannot call exit(0) and kill the kernel from the SageMake notebook

4 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Yes, that seems to be the case. That said they should have different worker IDs agent-0 and agent-1 ...
What's your trains-agent version ?

5 years ago

0 Hey, My Name Is Ido, And I Am A New Clearml User. My Goal Is To Monitor The Accuracy Of My Llm Outputs In Production. I Understand That I Can Log Each Iteration With A Binary Output (0 For Incorrect And 1 For Correct), But This Approach Makes The Visual G

I prefer serving my models in-house and only performing the monitoring via ClearML.

clearml-serving is an infrastructure for you to run models 🙂
to clarify, clearml-serving is running on your end (meaning this is not SaaS where a 3rd party is running the model)

By the way, I saw there is a project dashboard app which might support the visualization I am looking for. Is it suitable for such use case?

Hmm interesting, actually it might, it does collect matrices over time ...

one year ago

0 Hey I Use Allegro With Docker Mode. But I Do Not Have Access To Paths Where The Data Are(Data I Use For Training). How Can I Use "Volume Mount" With Allegro?

Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133

4 years ago

0 Hi, I Am Trying To Use Agent With A Sample, Very Simple Task. But It Stucks And Task Does Not Finish. In Ui In Console I See What I Pasted On Image. Do You Know What I Might Be Doing Wrong? Agent Is Run In Virtual Env Mode

do I need to have the repo that I am running on my account

If it is a public repo, then no need, credentials are only needed for private repos 🙂
Am I missing something ?

2 years ago

0 Hi, I'M Using The Dockerized Version Of Trains Get An Understanding Of Trains. While Trying To Play With The Trains.Conf Settings In ~/Trains.Conf I Got In A State, Where The Agent Is Not Been Able To Clone My Repo From

WickedGoat98 I suspect the main difference is with GitHub your are cloning with https (i.e. not credentials needed) , but with gitlab you are using SSH authentication to clone the repository .If on the machine running the trains-agent you can "git clone" your repository (i.e. from command line), the trains-agent should be able to do the same (basically make sure you have the SSH keys in your ~/.ssh folder.

Are you testing the trains-agent service from (i.e. from the docker compose) o...

4 years ago

0 I Have Set

im not running in docker mode though

hmmm that might be the first issue. it cannot skip venv creation, it can however use a pre-existing venv (but it will change it every time it installs a missing package)
so setting CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 in non docker mode has no affect

one year ago

Hmm I see, add this for example

extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]

None

one year ago

0 What Is The Recommended Way To Stop The Execution Of A Specific Agent? This Command Doesn'T Allow Me To Specify The Agent Ip I Want To Stop:

Maybe this one?
https://github.com/allegroai/clearml/issues/448
I think it is already there (i.e. 1.1.1)

3 years ago

0 I Have A Second Question As Well, Is It Possible To Disable Any Parts Of The Automagical Logging? In My Project I Use Both Config And Argparse. It Works By Giving Path To A Config File As A Console Argument And Then Allow The User To Adjust Values With Mo

Hi UnsightlyShark53 , just a quick FYI, you can also log the entire config file config.json this will be stored as model configuration, and you can see it in the input/output models under the artifacts tab.
See example here you can path either the path to the configuration file, or the dictionary itself after you loaded the json, whatever is more convenient :)

5 years ago

0 Can Anyone Point Me To Web Ui Source Code Out, I’M Wondering How Can I Customizing Web App Ui A Little Bit

Hi MagnificentPig49 unfortunately it's only in the trains-server docker, we are working on making it "presentable" and uploading it to it's repo.
It's written in Angular (v8 I think). Do you want to help out, it will definitely incentive the guys to tidy up the code and upload it :)

5 years ago

0 Hi Again. As I Am Running My Experiment From Server Using Agent, I Am Failing On The Point, Where The Arguments Of Argparse Are Processed. When Is The Agent Task Registered. I Am Getting None For Task.Current_Task() At The Begining Of My Script.

Hi WorriedParrot51
Let me shed some light on this complicated mechanism, because this is not very straight forward.
Basically the agent signals the trains package it should ignore the code calls, and use a specific Task in the backend (i.e. if in manual mode, the trains package logs the data into the trains-server, in agent mode (remote mode), it does the opposite and takes the data from the trains-server "into" the code)

Specifically, just like in manual mode, calling argparse.parse is be...

5 years ago

0 Hello Folks. We'Re A Small Team Currently Considering Adopting Clearml For Experiment Tracking. I Was Wondering If I Start With The Hosted Service And Decide To Switch To A Self-Hosted Server Later, Is There A Way To Export All The Experiments/Data/Etc Fr

Regulatory reasons and proprietary data is what I had in mind. We have some projects that may need to be fully self hosted in the end

If this is the case then, yes do self-hosted, or talk to clearml sales to get the VPC option, but SaaS is just not the right option

I might take a look at it when I get a chance but I think I'd have to see if ClearML is a good fit for our use case before I can justify the commitment

I hope it is 🙂

2 years ago

0 Hi There

Thanks !!!

5 years ago

0 Hello, In The Following Context:

: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them

This is the current state.
Downloading the artifacts is done only when actually calling get()/get_local_copy()

5 years ago

Show more results