AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Hi, I Try To Optimize My Hyperparamters With

BTW

/home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory

This error is from the agent, correct? it seems it did not clone the correct code, is train.py committed to the repository ?

3 years ago

0 Hi, I Try To Optimize My Hyperparamters With

Hmm, maybe the original Task was executed with older versions? (before the section names were introduced)
Let's try:
DiscreteParameterRange('epochs', values=[30]),Does that gives a warning ?

3 years ago

0 Hello Everyone, I Deployed Clearml (

Hi AgitatedTurtle16 could you verify you can access the API server with curl?

3 years ago

0 Hi, I Have A File On Azure Blob, Which Will Be A Parent For Some Experiments, Which In Every One Of Them I Will Manipulate The Orig File. Now I Want To Create A Dataset, Define The Orig File As The Parent, And Then, While Creating Each Of The New Files, D

Notice the parents argument when creating a new Dataset

3 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

Hi WickedStarfish97

As a result, I don’t want the Agent to parse what imports are being used / install dependencies whatsoever

Nothing to worry about here, even if the agent detects the python packages, they are installed on top of the preexisting packages inside the docker. That said if you want to over ride it, you can also pass packages=[]

2 years ago

0 Hi All, Looking For Some Help When Executing Pipelines With Custom Docker Images. I Have A Component Defined And I Expect Its Python Runtime Environment To Be Managed By A Custom Docker Image (

Hmm maybe different numpy version? ( numpy==1.22.1 maybe the Task needs a diff version) ? Can you post the Task log ?

2 years ago

0 Hi There, I Used

JitteryCoyote63

Should be added before the

if name == "main":

?

Yes, it should.
From you code I understand it is not ?
What's the clearml version you are using ?

2 years ago

0 Hi, Anyone Also Stuck With The Exception Encountered Uploading Pytorch Model File? The Dataset Upload Works Fine, Though.

Hi BitterStarfish58
What's the clearml version you are using ?

dataset upload both work fine

Artifacts / Datasets are uploaded correctly ?
Can you test if it works if you change " http://files.community.clear.ml " to " http://files.clear.ml " ?

2 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

It might be the file upload was broken?

2 years ago

0 Hi All, I Am Starting To Use Clearml-Agent. Run It With

👍

3 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

Thanks BitterStarfish58 !

2 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

So are you saying the large file size download is the issue ? (i.e. network issues)

2 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?

2 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

I'm not sure the files-server supports "continue" from last position...

2 years ago

0 Is Anyone Also Experiencing Network Error During Every Clearml Dataset Download? It'S Been A While And Almost Every Download Fails...

Hi BitterStarfish58
Where are you uploading it to?

2 years ago

0 Hi

task.models["outputs"][-1].tags (plural, a list of strings) and yes I mean the UI 🙂

I get the n_saved what's missing for me is how would you tell the TrainsLogger/Trains the current one is the best? Or are we assuming the last saved model is always the best ? (in that case there is no need for tag, you just take the last in the list)

If we are going with: "I'm only saving the model if it is better than the previous checkpoint" then just always use the same name i.e. " http:/...

4 years ago

0 Hi, Is It Possible To Sync Expiriment Using S3 Or Gs? I Loved To Have A Look At The Some Documentation. We Want To Sync The Training While They Are Running[Not Just When They Are Finished] Thanks,

Is there a solution for that?

Hi DisturbedElk70
Well assuming you mount/sync the "temp" folder of the offline experiment to a storage solution, then have another process (on the other side), syncing these folders, it will work and you will get "real-time" updates 🙂
Offline Folder:
get_cache_dir() / 'offline' / task_id

2 years ago

0 Hi, Is It Possible To Sync Expiriment Using S3 Or Gs? I Loved To Have A Look At The Some Documentation. We Want To Sync The Training While They Are Running[Not Just When They Are Finished] Thanks,

StaleButterfly40 just making sure I understand, are we trying to solve the "import offline zip file/folder" issue, where we create multiple Tasks (i.e. Task per import)? Or are you suggesting the Actual task (the one running in offline mode) needs support for continue-previous execution ?

2 years ago

0 I Have Some Old Training Jobs That I Logged With Tensorboard, Is It Possible To Add Them To Clearml?

I can read them programmatically using tensorboard and the log the using clearml logger,

StaleButterfly40 this will be a great script to put somewhere (I'm sure you are not the only one with this problem). Maybe put it as a GitHub issue ? wdyt ?

2 years ago

0 Can I Run A Random Task From A Queue? Like This

I'm running hyper parameter optimzation on LSF cluster where every task is an LSF job running without clearml-agent

WOW this is so cool! 🎊

2 years ago

0 Hi Everyone And Thanks Again For The Help, I Still Have No Success In Running Clearml Agent, It Just Gets Stuck Without Any Output, On Debug Mode For

yes i can communicate with the server, i managed to put tasks in the queue and retrieve them as well as running tasks with metrics reporting

Through the UI or python code ?

2 years ago

0 Hi Everyone And Thanks Again For The Help, I Still Have No Success In Running Clearml Agent, It Just Gets Stuck Without Any Output, On Debug Mode For

Yey!

2 years ago

0 Hi Everyone And Thanks Again For The Help, I Still Have No Success In Running Clearml Agent, It Just Gets Stuck Without Any Output, On Debug Mode For

ChubbyLouse32 could it be the configuration file is not passed to the agent machine itself ?
(were you able to run anything against this internal server? I mean to connect to it from code, clearml/cleamrl-agent) ?

2 years ago

0 Hi Everyone And Thanks Again For The Help, I Still Have No Success In Running Clearml Agent, It Just Gets Stuck Without Any Output, On Debug Mode For

This makes no sense to me 😞
Both are reading the exact same file, and using the same session / flow ...
Maybe there is an error with the "verify_certificate" on the agent ?

2 years ago

0 Hi, I Have A Task That'S Running On A Docker Container. Now - There Are A Bunch Of Other Docker Containers (Namely, Nvidia'S Tf 21.1 To 21.10) For Which I Want To Run The Task. How Can I Do This Using Agents / Remote Execution? Thanks

ImmensePenguin78 this is probably for a different python version ...

3 years ago

0 Hi, I Do The Following:

Many thanks!

3 years ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

Are you using tensorboard or do you want to log directly to trains ?

4 years ago

0 Hey Guys, I'M Hosting A Private Server Configured To Link To A S3 Bucket. I'M Having Difficulties Identifying The Reason For An Error In A Worker Recurring Repeatedly (Shown In The Screenshots Attached). It Basically Uses The Same Clearml.Conf File As The

Nice catch AverageBee39 🙂