AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hey, I Run A Programm Without Allegro On The Gpu And It Works. Then I Run It With Allegro. But The Training Does Not Start. The Gpu Is Allocated But The Training Does Not Start. The Programm Is Stuck. I Am Using The Newest Allegro Version 1.0.2 How Can I

Hi UnsightlySeagull42
How can I reproduce this behavior ?
Are you getting all the console logs ?
Is it only the Tensorboard that is missing ?

3 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

😞 It's working as expected for me...
That said I tested on Linux & pip,
Any specific req to test with? from the log I see this is conda on windows, are you using the base conda env or a venv inside conda?

3 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

okay, just so I understand, this is what you have on your client that can connect with the server:
api { api_server: web_server: files_server: credentials {"access_key": "KEY", "secret_key": "SECRET"} }

3 years ago

0 Is There A Way To Access Dataframe Logged Using Report_Table From The A Task Instance Instantiated Using Task.Get_Task(Id='.....')? I Have: T = Task.Get_Task(Id='....') And I Am Looking For Something Along The Lines Of: Df = T.Get_Table('Table Name')

Hi ThickDove42 ,
Yes, but by the time you will be able to access it, it will be in a display form (plotly), not very convient.
If this is something you need to re-use, I would argue that it is an artifact and should be stored as artifact (then accessing it is transparent) , obviously you can both report as table and upload as artifact, no harm in that.
what do you think?

3 years ago

0 Hi, I Have A Worker On A Machine Using Gpus 0,1 And Another Worker On The Same Machine Using Gpus 0,1,2,3,4,5. A Worker Ran A Task On Gpus 0,1 But For Some Reason The Second Worker Started Additional Task In Queue On Gpus 0,1,2,3,4,5, Which Caused Both Of

If you spin two agent on the same GPU, they are not ware of one another ... So this is expected behavior ...
Make sense ?

3 years ago

0 Hello, I Am Looking For A Way To Increase Number Of Images Saved In Results>Debug Samples. Looks Like There Is A Limit Of 100 Images Per Experiment, And All Images Saved After Are Not Displayed In Web Client. I Like To Have First Batch With Predictions V

You mean I can do Epoch001/ and Epoch002/ to split them into groups and make 100 limit per group?

yes then the 100 limit is per "Epoch001" and another 100 limit for "Epoch002" etc. 🙂

3 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
and
TRAINS_WORKER_NAME=second_agent trains-agent --gpus 0

4 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

curl seems okay, but this is odd https://<IP>:8010
it should be http://<IP>:8008
Could you change and test?
(meaning change the trains.conf and run trains-agent list )

3 years ago

0 Hello Everyone. Nice To Meet You I Got This Error When I Run Docker-Compose After Upgrading Clearml-Serving From 1.0 => 1.3 Have You Seen This Error? If You Did And Solved, Could You Tell Me How To Solve It?

Specifically your error seems to be an issue with nvidia Triton container upgrade

one year ago

0 I Try To Use A Ssh Key For Git. When I Push A File To Git I Still Have To Write My Name And Password. I Solved It By The Command: Git Remote Set-Url Origin Git@Rz-S-Git:C.Huber/Allegro.Git How Can I Use This With Allegro? At The Moment I Get Get Error: Fa

For .git-credentials remove the git_pass/git_user from the clearml.conf
If you want to use ssh you need to also add:
force_git_ssh_protocol: truehttps://github.com/allegroai/clearml-agent/blob/a2db1f5ab5cbf178840da736afdc370cfff43f0f/docs/clearml.conf#L25

3 years ago

you can also increase the limit here:
https://github.com/allegroai/clearml/blob/2e95881c76119964944eaa0289549617e8afeee9/docs/clearml.conf#L32

3 years ago

0 Hey Folks, Trying To Use The Model Class From The Clearml Sdk And Seeing Some Weird Errors. I Am Loading A Model This Way And Trying To See A Metadata Value For The Model Object.

I think you are correct, this is odd, let me check ...

one year ago

0 Hello Everyone! Is It Possible To Deactivate Package Analysis For Remote Execution? I Run My Code With Clearml-Agent In Docker Mode With Nvidia:Pytorch Container. When Clearml Is Running Inside The Docker The Installed Packages Of The Webui Get Updated. H

clearml will register conda packages that cannot be installed if clearml-agent is configured to use pip. So although it is nice that a complete package list is tracked, it makes it cumbersome to rerun the experiment.

Yes mixing conda & pip is not supported by clearml (or conda or pip for that matter)
Even python package numbers might not exist on both.
We could add a flag not to update back the pip freeze, it's an easy feature to add. I'm just wondering on the exact use case

3 years ago

preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.

Okay that's the part that I'm missing, how come in the first run the package existed and in the cloned Task they are missing? I'm assuming agents are configured basically the same (i.e. docker mode with the same network access). What did I miss here ?

3 years ago

ReassuredTiger98 both are running with pip as package manager, I thought you mentioned conda as package manager, no?
agent.package_manager.type = pipAlso the failed execution is looking for "ruamel_yaml_conda" but it is nowhere to be found on the original one?! how is that possible ?

3 years ago

0 Hi

(also could you make sure all posts regrading the same question are put in the thread of the first post to the channel?)

one year ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

Hi JealousParrot68
You mean by artifact names ?

3 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

Hmm SuccessfulKoala55 what do you think?

3 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

I did not start with python -m, as a module. I'll try that

I do not think this is the issue.
It sounds like anything you do on your specific setup will end with the same error, which might point to a problem with the git/folder ?

3 years ago

0 Hi All

@<1546303293918023680:profile|MiniatureRobin9>

, not the pipeline itself. And that's the last part I'm looking for.

Good point, any chance you want to PR this code snippet ?

    def add_tags(self, tags):
        # type: (Union[Sequence[str], str]) -> None
        """
        Add Tags to this pipeline. Old tags are not deleted. 
        When executing a Pipeline remotely (i.e. launching the pipeline from the UI/enqueuing it), this method has no effect.

        :param tags: A li...

one year ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

Nice! I'll see if we can have better error handling for it, or solve it altogether 🙂

3 years ago

0 Hi, I'M Having Some Issues That I Can'T Seem To Find Where The Problem Is Or How To Solve It. I'M Running Some Code On The Worker When I'M Trying To Download One Of The Artifacts That Can Be Found In The Input Model Task I'M Getting:

PompousBeetle71 could you check that the "output:destination" is the same for both experiments ?

4 years ago

0 Hey! I'M Having A Weird Issue When I Run Pip Freeze Locally It'S Showing Version "Clearml==0.17.5Rc6" But When I Initiate The Task It'S Always Starting With "Clearml==0.17.2" - This Version Isn'T Accepting Tags Through The Code Etc. (I'M Manually Fixing I

Hmmm, are you running inside pycharm, or similar ?

3 years ago

0 We Have A Environment Variables Definitions.Py File Which Every User Configures On Their Local Machine. This File Includes Local Paths As Well As Aws/Api Credentials. This Is An Issue When Spinning Up Clearml Tasks Since It Is Not Included In The Git Repo

yes 😞

2 years ago

0 Hi All

Hi @<1546303293918023680:profile|MiniatureRobin9> could it be the pipeline logic is created via the clrarml-task CLI? If this is the case, I think this is an edge case we should fix. Basically it creates a Task instead of pipeline, which in.essence only effects the UI. To solve it, just run the pipeline locally, notice that by default when you start it, it will actually stop the local run and relaunch itself on an agent.
Also, could you open a GitHub issue so we add a flag for it?

one year ago

Hi ReassuredTiger98

When clearml is running inside the docker the installed packages of the WebUI get updated.

Yes, this is by design, so the agent can always reproduce the exact python environment.
(internal the original requirements is also stored, but not available in the UI).
What exactly is the use case here ? wouldn't make sense to reproduce the entire working environment when you clone the executed Task ?

3 years ago

Then when ran a second time, the task will contain the requirements of the (conda-) environment from the first run.

What you see in the log under "Summary - installed python packages:" will be exactly what is updated on the Task. But it does not contain the "ruamel_yaml_conda" package, this is what I cannot get...
But I did find this part:
ERROR: conda 4.10.1 requires ruamel_yaml_conda>=0.11.14, which is not installed.Which point to conda needing this package and then failing to i...

3 years ago

Hi ReassuredTiger98
Could you send the log of both run ?
(I'm not sure this is a bug, or some misconfiguration , but the scenario should have worked...)

3 years ago

Sure thing :)

3 years ago

0 Anyone Doing Sagemaker With Clearml - Something Like The K8S Glue But The Tasks Are Pulled Into Sagemaker Training Jobs

I think my main point is, k8s glue on aks or gke basically takes care of spinning new nodes, as the k8s service does that. Aws autoscaler is kind of a replacement , make sense?

3 years ago

Show more results