AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Multiprocessing.Pool.Remotetraceback: """ Traceback (Most Recent Call Last): File "/Usr/Lib/Python3.6/Multiprocessing/Pool.Py", Line 119, In Worker Result = (True, Func(*Args, **Kwds)) File "/Usr/Lib/Python3.6/Multiprocessing/Pool.Py", Line 44, I

yes that makes send, I think what happened is one of the processes completed the Task (i.e. closed it) before the others did, and so they threw exception.

I switched to have all tasks in a separate process

I think that's probably the best (performance wise as well), nice!

4 years ago

0 Hi All, I Observed That When I Get A Dataset With

Hm, one of the issues I have with this change is that now every dataset hat doesn’t have a semantic version cannot be loaded anymore

Okay we definitely need to solve that.
Any chance I can ask to open a github issue (just so we do not forget).
I will pass it quickly along so that we can maybe offer a fix in the next RC

3 years ago

0 Hi, I'M Trying To Make Use Of New Capabilities Of Dag Creation In Clearml. Seems That Api Has Changed Pretty Much Since A Few Versions Back. There Seems To Be No Need In

Seems that api has changed pretty much since a few versions back.

Correct, notice that your old pipelines Tasks use the older package and will still work.

There seems to be no need in

controller_task

anymore, right?

Correct, you can just call pipeline.start() 🙂

The pipeline creates the tasks, but never executes/enqueues them (they are all in

Draft

mode). No DAG graph appears in

RESULTS/PLOTS

tab.

Which vers...

3 years ago

0 When Using Something Like Pdf2Image Which Requires Poppler (Which Can Be Installed With Conda), How Can I Ensure That The Task Can Run On An Agent Correctly? As Of Now It Doesn’T Know About Poppler

Do we support GPUs in a) docker mode b) k8s glue?

yes on both

Is there a good reference to get started with k8s glue?

A few folks here already set it up, do you have a k8s cluster with GPU support ?

4 years ago

0 Trains[Azure] Install - Azure Dependencies Not Latest. Trains Depends On Older Version Of Azure Python Sdk. My Project Already Has Dependency On The Latest Version. How Can This Be Resolved? Installing Collected Packages: Azure-Storage-Common, Azure-Stor

LazyLeopard18 are you using the StorageManager to access azure:// links?

5 years ago

0 What Is Being Stored Exactly In

So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now

Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?

3 years ago

0 Hi! I Am Getting The Following Error On An Agent:

I have the agent configured to force install requirements.txt

what do you mean by that?

3 years ago

0 I'M Having Issues Running Trains-Agent On My Aws, It Seems To Not Be Able To Install Pytorch... I Have

If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl

5 years ago

0 Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

No, I mean actually compare using the UI, maybe the arguments are different or the "installed packages"

4 years ago

0 Is There A Reason

Is there a reason

clearml

will use the demo server when there is no

~/clearml.conf

?

It's the default server for easy getting started journey, e.g. you run some sample code and it works , with zero configuration.
that said you can set an environment flag to disable the default server behavior .
CLEARML_NO_DEFAULT_SERVER=1
ReassuredTiger98
wdyt?

BTW:

it will push potentially proprietary data to the public demo server.

The server if su...

4 years ago

0 Hi, I Am Trying To Execeute My Code On Nvidia/Cuda Docker, But It Keeps Running, It Is Not Failed Or Not Aborted. The Last Log Message Is

I was just able to reproduce with "localhost"

5 years ago

0 Hi There. When Trying To Launch My Specific Docker, It Fails Launching Clientml-Agent Inside The Container Due To This...

But this is not copy, this is mount, your log showed cp failing

3 years ago

0 Hello, When Running A Task With A Remote Interpreter I Get

Can you also make sure you did not check "Disable local nachine git detection" in the clearml PyCharm plugin?

2 years ago

0 Https://Clearml.Slack.Com/Archives/Ctk20V944/P1713357955958089

If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)

Yes please! 🙏
In the mean time see if the workaround is a valid one

one year ago

0 Hi, Is It Possible To Resume An Experiment That Stopped Unexpectedly, By Using A Checkpoint Of The Model?

AstonishingSeaturtle47 , makes sense?

5 years ago

0 Hi, I Am Getting An Error While Running

Yey! 🙂

4 years ago

0 Two Questions About Datasets: Question 1: Are Parallel Writes To A Dataset With The Same Version Possible? Is The Way To Go, To Have A Task, Which Creates A Dataset Object, Which In Turn Is Passed As Artifact To The Subsequent Ingestion Tasks? After The P

Hi @<1661542579272945664:profile|SaltySpider22> I'm not sure I understand the answer to my parallel quesion

one year ago

0 ..

I have no idea what string reference could be used when steps come from Task?

Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the o...

3 years ago

0 Hi We Just Got The Aws Autoscaler To Create A New Instance When You Enqueue A Task To The Relevant Queue. However, For Some Reason The Task Itself Is Never Run, It Stays In The Pending State. When Looking At The Worker Details, It Says "No Queues Curren

When looking at the worker details, it says "No queues currently assigned to this worker"

Yes, I think we should have better information there, the "AWS service" is not directly pulling jobs from any specific queue, hence nothing there. It is "listening" to queues and launching machines, those machines will be listening to the queue. I wonder if it is just easier to also make sure it is listed as "assigned" to those queues . wdyt?

2 years ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

Hi @<1576381444509405184:profile|ManiacalLizard2>
If you make sure all server access is via a host name (i.e. instead of IP:port, use host_address:port), you should be able to replace it with cloud host on the same port

2 years ago

Many thanks!

5 years ago

0 Hi All. I Am Struggling With Integrating Plots Into My Task. Without The Plotting Code, The Task Never Completes The Execution And Seems To Hang. Also, The Plots Are Not Visible In The Plots Tab. I Am Running A For Loop For Different Models And Attemptin

and clearml version ?

4 years ago

0 Hello, Is There A Way To Disable Dataset Caching So That When

FreshParrot56 we could add this capability, but the main caveat is that f your version depends on multiple parent versions you still need to download and extract all the parent versions, which means that when you clear them you might hurt later performance. Does that make sense? What is the use-case / scenario for you?

2 years ago

0 I Originally Posted In

That would be great! Might have to use

2>/dev/null

in some of my bash scripts

Feel free to test and PR :)

One other question regarding connecting. We have setup sshd inside the docker image we are using.

Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...

3 years ago

0 When Using Docker Mode (And Specifically K8S Glue), What Are The Options For Caching? One Option Is Definitely Having A Base Image That Has The Things Needed. Anything Else? Thanks!

Gitlab has support for S3 based cache btw.

This might still be considered "slow" compared to local-dist/cluster mount

Would adding support for some sort of post task script help? Is something already there?

Interesting, can you expand on the use case? (currently there is only pre-task script, for setup)

4 years ago

0 This Message Is For The Clearml Team. I'Ve Found A Bug. I Think It'S Reproducible. Basically, When Dealing With Bools Inside Args, I Think What You Guys Do Is Just Cast It To Bool Since All The Args Are Stored As Strings If I'M Correct. Only Issue Is, Boo

Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.

I think this is the nargs="?" , is that right ?

3 years ago

0 Hello I'M Running A Local Agent . While Its Running The Task I Get This Error. Any Suggestion? Uccessfully Installed Numpy-1.24.4 Found Pytorch Version Torch==2.0.1 Matching Cuda Version 0 Found Pytorch Version Torchaudio==2.0.2 Matching Cuda Version 0 Er

So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?

2 years ago

0 Hi. Inside A Notebook When I Cerate A New Clearml Task And Then Run Sklearn Gridsearchcv , Clearml Uploads A Lot Of Model. Is There A Way To Force Clearml Not To Upload These Models? Related Question Is What Are These Models Anyway? Their Name Only Contai

Oh it makes sense now 🙏

2 years ago

0 Is It Possible To Run Multiple Agent On Ec2 Machines Started By The Autoscaler? Or Have The One Agent Run Multiple Queue Jobs At Once? E.G. Having The Autoscaler Start 1X P3.8Xlarge (4 Gpu) On Aws Might Be Better Than 4X P3.2Xlarge (1 Gpu) In Terms Of Ava

Sure thing!
BTW: not sure if it helps but the SaaS version integrates with Genesis Cloud I know they provide cheap GPUs might be worth checking

2 years ago

0 Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

See here:
https://download.pytorch.org/whl/torch_stable.html
cu110/* has no torch 1.3.1 only 1.7.0

4 years ago

Show more results