AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

Follow-up; any ideas how to avoid PEP 517 with the auto scaler?

Takes a

long

time to build the wheels

enable venv caching ?
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L116

3 years ago

0 I Uncommented The Line

Nice ! 🙂
btw: clone=True means creating a copy of the running Task, but basically there is no need for that , with clone=False, it will stop the running process, and launch it on the remote host, logging everything on the original Task.

3 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

Can you verify by adding the the following to your extra_docker_shell_script:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L152
extra_docker_shell_script: ["echo machine example.com > ~/.netrc", "echo login MY_USERNAME >> ~/.netrc", "echo password MY_PASSWORD >> ~/.netrc"]

3 years ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

DeterminedToad86
Yes I think this is the issue, on SageMaker a specific compiled version of torchvision was installed (probably part of the image)
Edit the Task (before enqueuing) and change the torchvision URL to:
torchvision==0.7.0Let me know if it worked

4 years ago

0 Is There Any Way To Clear The Installed Packages Of A Task Programmatically? (I.E. Using The Python Sdk And Not The Ui)

Regrading resetting it via code, if you need I can write a few lines for you to do that , although that might be a bit hacky.
Maybe we should just add a flag saying, use requirements.txt ?
What do you think?

4 years ago

0 Hey, Thanks For The Great Logging Tool

CloudyHamster42
RC probably in a few days, but notice that it will just remove the warnings, I still can't reproduce the double axis issue.

It will be helpful if you could send a small script to reproduce the problem.

Maybe this example code can help ? https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py

5 years ago

0 Hi All! Please Tell Me There Are Examples Of Clearml And Pytorch-Lightning Integration

BTW: Basically just call Task.init(...) the rest is magic 🙂

4 years ago

0 I Am Not Familiar With Pytorch, But Is It Expected That So Many “Models” Are Created? These Are Being Repeated As Well For A Single Task (This Is Training A T5_Model With Transformers):

Does a pipeline step behave differently?

Are you disabling it in the pipeline step ?
(disabling it for the pipeline Task has no effect on the pipeline steps themselves)

4 years ago

0 Hi, I Was Trying To Install Clearml Agent Using Helm Chart But My K8S Version Is Not Compatible. I Have Am Older K8S Version. Is There Anywhere I Could Get A Charr That Can Work With Lower Version Of K8S? Or Any Other Methods?

Hi @<1523701304709353472:profile|OddShrimp85>

there anywhere I could get a charr that can work with lower version of k8s? Or any other methods?

I think the solution is to install it manually from the helm chart (basically take it out and build a Job YAML, wdyt?

2 years ago

0 With

So if you set it, then all nodes will be provisioned with the same execution script.

This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?

4 years ago

0 Hi Guys, Probably Is Just Me Missing Something Along The Way:

Yes, but only with git clone 🙂
It is not stored on ClearML, this way you can work with the experiment manager without explicitly giving away all your code 😉

4 years ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)
task.get_offline_mode_folder()You can also have a soft link of the offline folder (if you are working on a linux machine:
ln -s myoffline_folder ~/.trains/cache/offline

5 years ago

0 Hi, Is There Any Way To Upload Data To A Clearml Dataset Without Compression At All? I Have Very Small Text Files That Make Up A Dataset And Compression Seems To Take Most Of The Upload Time And It Provide Almost No Benefits W.R.T Size

compression=ZIP_DEFLATED if compression is None else compressionwdyt?

2 years ago

0 Hey, I Was Wondering How Can I Do Hparams Tuning With Trains? Couldn'T Find Anything On The Documentation

So obviously that is the problem

Correct.
ShaggyHare67 how come the "installed packages" are now empty ?
They should be automatically filled when executing locally?!
Any chance someone mistakenly deleted them?
Regrading the python environment, trains-agent is creating a new clean venv for every experiment, if you need you can set in your trains.conf :
agent.package_manager.system_site_packages: true
https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c67...

4 years ago

0 Hi, Is It Possible To Resume An Experiment That Stopped Unexpectedly, By Using A Checkpoint Of The Model?

AstonishingSeaturtle47 , makes sense?

5 years ago

0 Hi Everyone, I'M Trying To Execute Trains-Agent In Docker Mode With Conda As Package Manager, Is It Supported? I Tried To Work With Nvidia/Cuda:10.0-Runtime-Ubuntu18.04 And Got The Error "Trains_Agent: Error: Error: Package Manager "Conda" Selected, But '

RattySeagull0 I think you are correct, python 3.6 is the installed inside the docker. Is it important to have 3.7 ? You might need another docker (or change the installation script and install python 3.7 inside)

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

Yep it should :)
I assume you add the previous iteration somewhere else, and this is the cause for the issue?

3 years ago

0 Is There Any Way To Clear The Installed Packages Of A Task Programmatically? (I.E. Using The Python Sdk And Not The Ui)

task = Task.init(...) if task.running_locally(): # wait for the repo detection and requirements update task._wait_for_repo_detection() # reset requirements task._update_requirements(None)🙂

4 years ago

0 Is It Possible To Upload A Hyperdataset? Or Can We Only Upload Datasts

Hi @<1727497172041076736:profile|TightSheep99>
Yes it can, it will upload the meta-data as well as the files (it will also do de-dup and will not upload files that already exist in the dataset based on the hash of teh file content)

one year ago

0 I Am Using Clearml Pro And Pretty Regularly I Will Restart An Experiment And Nothing Will Get Logged To Clearml. It Shows The Experiment Running (For Days) And It'S Running Fine On The Pc But No Scalers Or Debug Samples Are Shown. How Do We Troubleshoot T

task.connect(model_config)
task.connect(DataAugConfig)

If these are separate dictionaries , you should probably use two sections:

    task.connect(model_config, name="model config")
    task.connect(DataAugConfig, name="data aug")

It is still getting stuck.
I notice that one of the scalars that gets logged early is logging the epoch while the remaining scalars seem to be iterations because the iteration value is 1355 instead of 26

wait so you are seeing Some scalars ?...

one year ago

0 How Do People Solve This? If I Am Pip Installing A Custom Package From .Tar.Gz, How Can I Ensure That If I Run The Experiment (Initially Run From A Notebook) Via The Queueing It Can Be Properly Installed Steps - Notebook -> Get A Tar.Gz From S3 -> Pip I

If i were to push the private package to, say artifactory, is it possible to use that do the install?

Yes that's the recommended way 🙂
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65

4 years ago

0 Hi I'M Using Clearml Datasets. How Do I Tell From The Clearml Ui Which Datasets Version Am I Using?

feature is however available in the Enterprise Version as HyperDatasets. Am i correct?

Correct
BTW you could do:
datasets_used = dict(dataset_id="83cfb45cfcbb4a8293ed9f14a2c562c0") task.connect(datasets_used, name='datasets') from clearml import Dataset dataset_path = Dataset.get(dataset_id=datasets_used['dataset_id']).get_local_copy()This will ensure that not only you have a new section called "datasets" on the Task's configuration, buy tou will also be able to replace the datase...

3 years ago

0 Greetings Everyone, In The Course Of My Work, I Utilize A Particular Library That Necessitates More Than Just A Simple Clone And Dependency Installation Procedure. It Also Requires The Cloning Of An Additional Repository, Along With Its Installation, And

see here the docker_setup_bash_script argument
None
It will be executed (no need for the #!/bin/bash btw) before starting to setup the env inside the container, so apt-get and the like can be executed if needed. Notice that if this is something that Always needs to be executed, you can put the same list of commands here: [None](https://github.com/allegroai/clearml-agen...

2 years ago

0 I Hit A Issue That I Cannot See My Matplotlib Plot, But It Was Shown In The Panel. Any Idea?

EnviousStarfish54
plt.show will capture the figure, that if you call it multiple times, it will add a running number to the figure itself (because the figure might change, and you might want the history)
if you call plt.imshow, it's the equivalent of debug image, hence it will be shown in the debug-samples tab, as an image.
Make sense ?

5 years ago

0 Hi Everyone, Just Setup Trains.. Was Very Easy To Setup. Was Able To Run An Experiment With It. Question: Is It Possible To Turn Off The Code Tracking (Anything Related To Git) ?

5 years ago

0 I Want To Retrieve The Logged Metrics To Be Able To Save The Best Model From My Training. This Is My Step:

SteadyFox10 With pleasure 🙂
BTW: you can retrieve the Task id from its name with
Task.get_tasks(project_name='my project', task_name='my task name')See https://allegro.ai/docs/task.html?highlight=get_tasks#trains.task.Task.get_tasks

5 years ago

0 Hey All -- I'M Fairly New To This But, As Of Today, My Required Packages Aren'T Being Recognized In Cloned Runs And They Are Repeatedly Failing. Has Anyone Had Similar Issues/Found A Fix?

What's the Windows version, python version, clearml version, you are using ?

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?

3 years ago

0 Hi Everyone, I Have Questions Related To Clearml-Serving.

I have timeseries dataset with dimension 1,60,1 which the first dimension is number of data, the second one is timestep

I think it should be --input-size 1 60 ` if the last dimension is the batch size?
(BTW: this goes directly to Triton configuration, it is the information Triton needs in order to run the model itself)

3 years ago

0 Is There An Elegant Way To Download All Images Posted In “Debug_Samples” From The Trains Server?

TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?

4 years ago

Show more results