AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 <no title>

Awesome! let me know how/if it works 🙂

3 years ago

0 Different Question About Warnings: I'M Getting (Infrequently) This Warning, Followed By My Script Hanging

that's the entire repo link ? not something like https://github.com/ ... ?

3 years ago

0 Hey! Does Anyone Know If I Can Use Different Ports For My Clearml Ui Server?

This depends on how you spined the server, basically as long as you configure the clients (i.e. python clients) correctly, there is no issue.
But the auto generated configuration might be off (in the UI when you credentials it tells the clearml-init where the server is and the ports)
I would actually recommend subdomains if this is possible
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#sub-domain-configuration
wdyt?

2 years ago

0 Hi, I Try To Optimize My Hyperparamters With

Hmm ConvincingSwan15

WARNING - Could not find requested hyper-parameters ['Args/patch_size', 'Args/nb_conv', 'Args/nb_fmaps', 'Args/epochs'] on base task

Is this correct ? Can you see these arguments on the original Task in the UI (i.e. Args section, parameter epochs?)

3 years ago

0 Hello, I Am Getting `Valueerror: Could Not Get Access Credentials For '

the second seems like a botocore issue :
https://github.com/boto/botocore/issues/2187

3 years ago

0 Hi There, There Seems To Be An Issue In The Web Ui -> Viewing Plots In "View In Experiment Table" Doesn'T Respect The "Scalars To Display" One Sets When Viewing In "View In Fullscreen". Is This A Bug Or Expected Behaviour?

Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81

4 years ago

0 Hi, I Would Like To Configure Clearml-Server To Connect To An S3 Bucket In Order To Store Artefacts - I'Ve Taken A Look On This Page

None

3 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

(If you are running the trains-agent with the exact same command, I (think) you will get the same worker_id in which you will end up with something similar to what you describe)
To solve it add TRAINS_WORKER_NAME="new_unique_name" trains-agent ...
I think we resolve it automatically, but based on your description it looks like we use the same worker name/id multiple times ...

4 years ago

0 Hi All, I'M Trying To Deploy Trains On Rancher (Nice Kubernetes Cluster Orchestration Project) Where I'M Quite New To Rancher And Kubernetes. I Have Been Able To Install Trains Using Helm

Will such an docker image need a trains configuration file?

If you need to configure things other than credentials (see above) than yes you might need to map trains.conf into the pod.
Specifically, if you need, map your trains.conf to /root/.trains inside the pod/container

3 years ago

0 I Cloned It And Scheduled It To The Default Queue, But It Is Not Being Processed. Is The Default Queue By Default Not Usable?

WickedGoat98 did you setup a machine with trains-agent pulling from the "default" queue ?

3 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

One example is a node that resizes the images, this node receives as input a Dataset and iterates over each image, resizes it an outputs a new Dataset, which is used in the next node downstream in the Pipeline.

I agree, this sounds like a "function" rather than a job, so better suited for Kedro.

organization structure

and see for yourself (this pipeline has two nodes

train_model

and

predict

)

Interesting! let me dive into that and ...

3 years ago

0 After I Have Create A Task And Closed It In A Notebook, Any Activity Seems To Trigger Another Task. For Example:

Okay that actually makes sense, let me check I think I know what's going on

3 years ago

0 Hi Guys, I’M Trying To Install It My Lab Server, But When I Try To Create Credentials, It Says Error And Gives More Info: Error 301 : Invalid User Id: Id=F46262Bde88B4928997351A657901D8B, Company=D1Bd92A3B039400Cbafc60A7A5B1E52B

Yes, let's assume we have a task with id aabbcc
On two different machines you can do the following:
trains-agent execute --docker --id aabbccThis means you manually spin two simultaneous copies of the same experiment, once they are up and running, will your code be able to make the connection between them? (i.e. openmpi torch distribute etc?)

3 years ago

0 Hi, Thank You So Much For Your Awesome Product! But I Have One Issue, Please Tell Me How To Fix It: I Deployed Clearml-Server On A Corporate Virtual Machine. Its Address 10.68.167.10. I Am Able To Send Requests From All Other Virtual Machines On The Serv

The other way around
- "8011:8008"

one year ago

0 Hi! Is There Something Happening With The

https://github.com/allegroai/clearml/commit/56825f9e7af26a43da0f2f12454e23a43e995a25

3 years ago

0 Hey All. Another Question - How Are Private Packages Handled/Installed So That Clearml-Agent Can Execute A Task? I Have A Bunch Of Private Repos For Communicating With The Data Warehouse. I Could Do A System-Wide Installation For It On The Clearml-Agent I

TenseOstrich47 FYI:
This might what you are looking for 🙂
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L61

3 years ago

0 Hi Trains Community, I See That For The

Hi SteadyFox10 , unfortunately trains-agent currently supports only docker as a container solution (I guess they became the de-facto standard)
That said, there is the option of virtual environment, where the trains-agent installs everything inside a newly created virtual environment. That actually makes it quite easy to expand to other use cases. Essentially the docker option will spin a docker install trains-agent inside the docker and run it execute command.
Do you fee l...

4 years ago

0 Thank You All For Taking The Time To Answer Our Survey (If You Haven'T Already, We Urge You To

I think we added it somewhere in 0.14, anyhow I just checked the Logger doc, it is there now 🙂

4 years ago

0 Hi. Help

Hi PanickyMoth78

I had several pipeline components getting it and uploading files to is concurrently.

Should not be a problem

I've attached it's log file which only mentions skipping one file (a warning)

So what exactly is the error you are getting?

2 years ago

0 Trains[Azure] Install - Azure Dependencies Not Latest. Trains Depends On Older Version Of Azure Python Sdk. My Project Already Has Dependency On The Latest Version. How Can This Be Resolved? Installing Collected Packages: Azure-Storage-Common, Azure-Stor

Check here:
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L78

You can configure credentials based on the bucket name. Should work for Azure as well

4 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Okay could you test with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/.singularity.d/libs/

4 years ago

0 I Have Setup A

Q. Would someone mind outlining what the steps are to configuring the default storage locations, such that any artefacts or data which are pushed to the server are stored by default on the Azure Blob Store?

Hi VivaciousPenguin66
See my reply here on configuring the default output uri on the agent: https://clearml.slack.com/archives/CTK20V944/p1621603564139700?thread_ts=1621600028.135500&cid=CTK20V944
Regrading permission setup:
You need to make sure you have the Azure blob credenti...

3 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

I did not start with python -m, as a module. I'll try that

I do not think this is the issue.
It sounds like anything you do on your specific setup will end with the same error, which might point to a problem with the git/folder ?

3 years ago

0 Hello! Since Today I Get

I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?

3 years ago

0 Hello Everyone. I'M Getting Started With Clearml. I'M Trying Hpo Atm And Have Successfully Run The Base Task. When Running The Clone Of The Base Task In One Of The Agents, I'M Getting Following Error. Any Suggestions? Tia

The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it

I think this is the main issue, how come it does not catch it? Are you using argparser ?

one year ago

0 If I Have A Dataset And I Process It And I Want The Processed Data As Another Dataset, Is Parent The Right Approach?

Parent makes sense if you are changing the data of the parent version, but some data is preserved. Which will make the delta-based storage only store the diff.
If everything is different, and you call sync for example, then it will not reference any previous "snapshot", so there will be no redundancy in storage, but you still get a pointer to the "parent" version.
Make sense ?

3 years ago

0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)

2 years ago

0 Hi, Is There Any Document About Migration Clearml-Server. Currently, I Have Clearml-Server Running On Servera But I Want To Move All Data (Including Artifacts, Task, Dataset) From Servera To Serverb.

Nice!

2 years ago

0 When It Comes To Continuous Training, I Wanted To Know How You Train Or Would Train If You Have Annotated Data Incoming? Do You Train Completely Online Where You Train As Soon As You Have A Training Example Available? Do You Instead Train When You Have A

My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train

This is usually a cost optimization issue, generally speaking if GPU up time is not an issue that the process is stochastic anyhow, so waiting for a batch or not is not the most important factor (unless you use batchnorm layer, in that case this is basically a must)

I would not be able to split the data into train test splits, and that it would be very expensiv...

2 years ago

0 Hello Everyone. I'Ve Just Started Playing With Clearml. In The 2Nd 'Getting Started' Tutorial, I Launched The Agent From Google Colab. But Whenever A Task Is Picked, It Fails For The Following Error. Any Clues? Thank You!

Oh!
I see this is using the colab as remote agent (i.e. to launch jobs on it),

[ColabKernelApp] CRITICAL | Bad config encountered during initialization: The 'kernel_class' trait of <main.ColabKernelApp object at 0x7fa41b29e5c0> instance must be a type, but 'google.colab._kernel.Kernel' could not be imported

Can you send the full log?

6 months ago

Show more results