AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi There,

it is agg

one year ago

0 Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

yes you are correct, I would expect the same.
Can you try manually importing pt, and maybe also moving the Task.init before darts?

one year ago

0 With

So when the agent fire up it get's the hostname, which you can then get from the API,

I think it does something like "getlocalhost", a python function that is OS agnostic

3 years ago

0 Also, Not Sure Where To Ask This Question. I Am Following The Instructions From Here:

Try to upload something to the file server ?
None

one year ago

0 Also, Not Sure Where To Ask This Question. I Am Following The Instructions From Here:

Hi @<1603198134261911552:profile|ColossalReindeer77>
I would also check this one: None

one year ago

0 Also, Not Sure Where To Ask This Question. I Am Following The Instructions From Here:

check if the fileserver docker is running with docker ps

one year ago

0 Also, Not Sure Where To Ask This Question. I Am Following The Instructions From Here:

Any chance you can PR a fix to the docs?

one year ago

0 Also, Not Sure Where To Ask This Question. I Am Following The Instructions From Here:

Thank you so much!! 🤩

one year ago

0 Also, Not Sure Where To Ask This Question. I Am Following The Instructions From Here:

/opt/clearml/data/fileserver this is ion the host machine and it is mounted Into the container to /mnt/fileserer

one year ago

0 Any Pointers On Running Gpu Tasks With K8S Glue?

For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944

3 years ago

0 Hello All, I Wanted To Get The Advice Of The People Here About Data Versioning And Tracking Using Clearml. Many Of The Dataset We Work With Are Generated By Sql Query. It’S Not Necessary To Generate Them Every Time But I’M Trying To Get Advice On How To

Hi @<1545216070686609408:profile|EnthusiasticCow4>

Many of the dataset we work with are generated by SQL query.

The main question in these scenarios is, are those DB stable.
By that I mean, generally speaking DB serve applications, and from time to time they undergo migration (i.e. change in schema, more/less data etc).
The most stable way is to create a script that runs the SQL query, and creates a clearml dateset from it (that script becomes part of the Dataset, to have full tracta...

one year ago

0 Hi! How Can We Edit The Parameters Of The Clearml Pro Aws Autoscaler E.G. To Add An Init Script Or To Expand Its Capacity, Please? At The Moment The Only Way We Found Is To Wait Until All The Jobs On It Are Finished, Clone It, Kill It, Start A New One

to add an init script or to expand its capacity,

@<1546665634195050496:profile|SolidGoose91> I seem to see it in the wizard here, what am I missing?

one year ago

0 Hey Guys Trying To Save A Model Via The Outputmodel.Update_Weights Function I Get The Following Error:

btw: what's the OS and python version?

one year ago

0 Hello, I Am Trying To Run Some Algorithm In My Docker Container With Clearml Task . But The Algorithm Uses Ros, So I Need Somehow To Setup Environment Before Run It And Launch

. I'm trying to run to get a task to run using a specific docker image and to source a bash script before execution of the python script.

Are you running an agent in docker mode ? if so you should be able to see the Output of your bash script first thing in the log
(and it will appear in the docker CMD)

one year ago

0 Is It Possible To Link Independent Training Experiments.. For Example.. I Have An Ensemble Of 2 Models (A & B) Each Models Are Trained Under Their Own Training Task In Trains Now I Will Run Another Script Which Will Use These Models To Create An Ensemble

Month +- 🙂

4 years ago

Hi PompousParrot44
So do you mean something like:
` task_model_a = Task.get('id_a')
task_model_b = Task.get('id_b')

model_a_file = task_model_a.models['output][-1].get_local_copy()

model_b_file = task_model_b.models['output][-1].get_local_copy() `

4 years ago

0 Hi, I Have Another Problem

okay, now it should work 🙂

4 years ago

0 Hi, I Have Another Problem

You're welcome 🙂

4 years ago

Hmm I see what you mean. It is on the roadmap (ETA the next version 0.17, 0.16 is due in a week or so) to add multiple models per Task so it is easier to see the connections in the UI. I'm assuming this will solve the problem?

4 years ago

0 Question, Lets Say I'M Kaggling, As You Might Know Some "Code Competitions" Are Restricted From Internet Access. In The General Case, You Might Have Some Inference Code Running In An Environment Which Is Isolated From The Net (Data Privacy Issues And Suc

Hi WackyRabbit7 ,
Yes we had the same experience with kaggle competitions. We ended up having a flag that skipped the task init :(
Introducing offline mode is on the to do list, but to be honest it is there for a while. The thing is, since the Task object actually interacts with the backend, creating an offline mode means simulation of the backend response. I'm open to hacking suggestions though :)

4 years ago

0 Hi, I Have Another Problem

(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)

4 years ago

0 Hi, I Have Another Problem

what do you see in the console when you start the trains-agent , it should detect the cuda version

4 years ago

0 Hi, I Have Another Problem

This is also set in the command line.
--cpu-only or maybe without any --gpus flag at all

4 years ago

0 Hi, I Have Another Problem

You can always force it with environment variable CUDA_VERSION=10.1

4 years ago

0 Hi, I Have Another Problem

This is why it thinks it's CPU 🙂

4 years ago

0 Hi, I Have Another Problem

It is configured as CPU (i.e. no CUDA)

4 years ago

0 Hi, I Have Another Problem

cuda 10.1, I guess this is because no wheel exists for torch==1.3.1 and cuda 11.0

Correct

how can I enforce a specific wheel to be installed?

You mean like specific CUDA wheel ?
you can simple put the http link to the wheel in the "installed packages", it should work

4 years ago

0 Hi, I Have Another Problem

What you actually specified is torch the @ is kind of pip remark, pip will not actually parse it 🙂
use only the link https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl

4 years ago

0 Hi, I Have Another Problem

Hi JitteryCoyote63
What do you have in the agent.cuda_version ?
(you can see it printed at the beginning of the log)

4 years ago

0 Hi, I Have Another Problem

That depends on what you have installed 🙂

4 years ago

Show more results