AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi. After Upgrading Clearml To Latest Version, Got This Error From My Pipeline (Windows10, Configured And Running Tensorflowod For Tf 2.3.):

In Windows setting

system_site_packages

to

true

allowed all stages in pipeline to start - but doesn't work in Lunux.

Notice that it will inherit from the system packages not the venv the agent is installed in

I've deleted tfrecords from master branch and commit the removal, and set the folder for tfrecords to be ignored in .gitignore. Trying to find, which changes are considered to be uncommited.

you can run git diff it is essentially...

4 years ago

0 Hi All, I Am Trying To Execute Somewhat Custom Hpo Scheme With Clearml. I Would Want That A Single Running Python Script Will Be Able To Sample The Optimizer, Init A Task And Report The Result Multiple Times. I Didn'T Find Anything Similar In The Docs Or

Okay Now I get it!
Let me think about it for an hour or two 😄

4 years ago

0 Hi, While Running My Experiments I Get This Message : "Clearml Monitor: Could Not Detect Iteration Reporting, Falling Back To Iterations As Seconds-From-Start" - I Believe It Happens Due To The Fact I Have A Heavy Calculation During The Run And I Assume C

assume clearml has some period of time that after it, shows this message. am I right?

Yes you are 🙂

is this configurable?

It is 🙂
task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

ElegantKangaroo44 I tried to reproduce the "services mode" issue with no success. If it happens again let me know maybe will better understand how it happened (i.e. the "master" trains-agent gets stuck for some reason)

5 years ago

0 Hi! I Am Using The Modelcheckpoint Callback From Tensorflow To Save The Best Model. When The Experiment Finishes If I Go On The Server To Experiment > Artifacts > Output Model I Can See The Model And Subsequently By Clicking On It The Weights. How Can I

I get the URL to the checkpoint/weights

Is it a valid URL ?
GrievingTurkey78 Do you have there http:// or is it file:// ?

4 years ago

0 Hello! How Can I Use "Report_Scatter2D" In Order To Report Timestamp In The X-Axis?

Feel free to open an issue on GitHub making sure this is not forgotten

4 years ago

0 Hello! Getting Credential Errors When Attempting To Pip Install Transformers From Git Repo, On A Gpu Queue.

Yes please, just to verify my hunch.
I think that somehow the docker mounts the agent is creating are (for some reason) messing it up.
Basically you can just run the following (it will do everything automatically) (replace the <TASK_ID_HERE> with the actual one)
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig ...

4 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Hi ExcitedFish86

In Pytorch-Lightning I use DDP

I think a fix for pytorch multi-node / process distribution was commited to 1.0.4rc1, could you verify it solves the issue ? (rc1 should fix this specific issue)
BTW: no problem working with cleaml-server < 1

4 years ago

0 Hello Community! How I Can Add S3 Credentials To S3 Bucket In Example.Env For Clearml-Serving-Triton? I Need To Add Bucket Name, Keys And Endpoint

AbruptHedgehog21 the bucket and the full link are registered on the model object itself, you can see them in the ui, under the models tab. The only thing you actually need to pass inside is the credentials. Make sense?

3 years ago

0 Executed From Within A Pipelinecontroller Task, What Possible Reason Does

So I checked the code, and the Pipeline constructor internally calls Task.init, that means that after you constructs the pipeline object, Task.current_task() should return a valid object....
let me know what you find out

3 years ago

0 When I Run Experiments I Set

IntriguedRat44 If the monitoring only shows a single GPU (the selected one) it means it reads the correct CUDA_VISIBLE_DEVICES (this is how it knows that you are only using a selected GPU not all of them).
There is nothing else in the code that will change the OS environment.
Could you print os.environ['CUDA_VISIBLE_DEVICES'] while running the code to verify ?

4 years ago

0 Hey Clearml Community! Quick Question About Plots - We'Re Trying To Draw A Reliability/Calibration Plot, We Want To Make It Square As Seen In The First Picture Since It Makes The Visual Analysis Of It Much Easier, But Clearml 'Insists' On Squishing It Dow

GloriousPenguin2 hmm the UI might strip it?! I mean in most case it should not be there in the first place. Maybe we need to make sure that if provided the web UI will use the stored plotly definition, if this is the case we need to make sure that by default we do not store it, so in most cases the UI can use it to improve the layout. wdyt?

3 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

So the main difference is kedro pipelines are function based steps (I might be overly simplifying, so please take it with a grain of salt), while in ClearML pipeline is Job, i.e. it needs its own environment and is longer than a few seconds (as opposed to a single function)

4 years ago

0 I'M Having Issues Running Trains-Agent On My Aws, It Seems To Not Be Able To Install Pytorch... I Have

Try adding this environment variable:
export TRAINS_CUDA_VERSION=0

5 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

One example is a node that resizes the images, this node receives as input a Dataset and iterates over each image, resizes it an outputs a new Dataset, which is used in the next node downstream in the Pipeline.

I agree, this sounds like a "function" rather than a job, so better suited for Kedro.

organization structure

and see for yourself (this pipeline has two nodes

train_model

and

predict

)

Interesting! let me dive into that and ...

4 years ago

0 Hi All! I Have A Question About Pipelines. My Pipeline Consists Of Several Steps:

GrotesqueDog77 this should just work, decorate the functions with @PipelineDecorator.component and call the functions one after the other
paths = step_one() step_two(paths)ClearML will make sure it serializes the strings and pass them to step two (of course step two should actually run on a machine with access to the same folder, but this is another issue 🙂 )

2 years ago

0 Question Regarding Tensorboard (If There Is An Answer Here Already Please Send Me A Link). I Have A Few Graphs With The Same X Axis But Different Y Axis That Are Presented On Different Graphs In Tensorboard And For Some Reason Trains Joins Them On The Sam

BTW: CloudyHamster42 I think this issue was discussed on GitHub, and the final "verdict" was we should have an option to split/combine graphs on the UI side (i.e. similar to the "smoothing" or wall-time axis etc.)

5 years ago

0 Hi Folks, One Question: I Have A Script That Looks Like:

I was expecting the remote experiment to behave similarly, why do I need to import pandas there?

The only problem os that the remote code did not install pandas , once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?

3 years ago

0 Hello Everyone, I’M Newcomer For Clearml. I Have Question Related To

Maybe that's the issue :
https://github.com/googleapis/python-storage/issues/74#issuecomment-602487082

4 years ago

0 Hello Everyone. I'Ve Just Started Playing With Clearml. In The 2Nd 'Getting Started' Tutorial, I Launched The Agent From Google Colab. But Whenever A Task Is Picked, It Fails For The Following Error. Any Clues? Thank You!

It actually started executing your code, but it did not capture it correctly:

/root/.clearml/venvs-builds/3.10/bin/python -u /root/.clearml/venvs-builds/3.10/code/colab_kernel_launcher.py

Which I assume means the actual Task had bad code.
What do you have under the Task execution tab in the UI (the one you were launching, i.e. enqueueing )

one year ago

0 Hey Guys, I Keep Getting "Failed Parsing Task Parameter" Warning For The Arguments Such As This One:

parser.add_argument( "--dataset_mean", type

=

float, nargs

=

"+", default

=

0.5)

I think providing nargs='+ ' assumes the type is a list. nonetheless we should be able to support it. Could you please add a GitHub issue so we do not forget ?

on the side note, is there any way to automatically give more meaningful names to the running docker containers?

What do you mean by that? running where? and where will you see them ?

4 years ago

0 My Agent Is Not Fully Utilized. I Wonder Anyhow I Could Run Multi-Task On A Same Agent Without Queuing?

No by definition the agent will only execute one Task at a time, you can spin a second agent on the same GPU :)

4 years ago

0 How Many People Are Actually Working At Allegroai/On Clearml?

A Lot 😄

4 years ago

0 How Many People Are Actually Working At Allegroai/On Clearml?

Just wanted to know how many people are actively working on clearml.

probably 30+ 🙂
ReassuredTiger98 are you afraid from lack of support? or are you offering some (it is always welcomed) ?

4 years ago

0 I Use

Hi SteadyFox10
I promised to mention here once we start working on ignite integration, you can check it here:
https://github.com/jkhenning/ignite/tree/trains-integration
Feel free to provide insights / requests 🙂

As for the model upload. The default behavior is
torch.save() calls will only be logged , nothing more. But, if you pass to the Task.init output_uri field, then all your models will be uploaded automatically. For example:
` task = Task.init('examples', 'model upload test', o...

5 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

I just set the git credentials in the

clearml.conf

and it works out of the box

git has issues with passing the user/token from the main repo to the submodules, hence my surprise that it is working out-of-the-box.
Do notice that if you are ussing ssh-key this is a none issue.

Nope, no

.netrc

defined anywhere, ...

If this is the case can you try to add the following to your "extra_vm_bash_script"
` echo machine example.com > ~/.netrc && echo log...

3 years ago

0 Hi Team, Me Again! Im Curious If Someone Can Explain To Me Better How Task And Optimisers Integrate With Each Other. In The Example Hyperparameter Optimisation, There Is Both A Task Initialised With

Essentially the example provide just prints out ids to the log file,

What do mean?

4 years ago

0 As Soon As I Refactor My Project Into Multiple Folders, Where On Top-Level I Put My Pipeline File, And Keep My Tasks In A Subfolder, The Clearml Agent Seems To Have Problems:

Hi @<1724960468822396928:profile|CumbersomeSealion22>

As soon as I refactor my project into multiple folders, where on top-level I put my pipeline file, and keep my tasks in a subfolder, the clearml agent seems to have problems:

Notice that you need to specify the git repo for each component. If you have a process (step) with more than a single file, you have to have those files inside a git repository, otherwise the agent will not be able to bring them to the remote machine

one year ago

0 As Soon As I Refactor My Project Into Multiple Folders, Where On Top-Level I Put My Pipeline File, And Keep My Tasks In A Subfolder, The Clearml Agent Seems To Have Problems:

Yes, I do have my files in the git repo. Although I have not quite understood which part it takes from the remote git repo, and which part it takes from my local system.

it will do "git pull" on the remote machine and then apply any uncommitted changes it has stored in the Task

It seems that one also needs to explicitly hand in the git repo in the pipeline and task definitions via PipelineController,

Correct, unless the pipeline logic and the steps are the same git repo, you can...

one year ago

0 Hi All, Any Idea Why Spawned Trainings During Optimization Can End With The Following Message

Hi CurvedHedgehog15

User aborted: stopping task (3)

?

This means "someone" externally aborted the Task, in your case the HPO aborted it (the sophisticated HyperBand Bayesian optimization algorithms we use, both Optuna and HpBandster) will early stop experiments based on their performance and continue if they need later

3 years ago

Show more results