AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 In Order For A New Worker To Come Online In My K8 Cluster, Do I Need To Have An Ec2 Startup Script Init The Agent/Config, And Then Start The Daemon? Do I Have To Do This Manually Is This A Better Way?

Yes, this is exactly how the clearml k8s glue works (notice the resource allocation, spin nodes up/down, is done by k8s which sometimes do take some time, if you only need "bare metal nodes" on the cloud, it might be more efficient to use the aws autoscaler, that essentially does the same thing

2 years ago

0 So I Bumped Onto This Comparison Shared By Dagshub. It Kinda Placed Clearml Is A Rather Bad Position Compared To Everything Else In The Industry.

Please feel free to do so (always better to get it from a user not the team behind the product 😉 )

4 years ago

0 I Have A Reporting Task I Want To Schedule Using Taskscheduler. 2 Main Input Params Are

Hi FiercePenguin76

Maybe it makes sense to use

schedule_function

I think you are correct. This means the easiest would be to schedule a function, and have that function do the Task cloning/en-queuing. wdyt?

As a side note , maybe we should have the ability of custom function that Returns a task ID. the main difference is that the Task ID that was created will be better logged / visible (as opposed to the schedule_function, where the fact there was a Task that was created / ...

3 years ago

0 Hi! I'M Currently Considering Switching To Clearml. In My Current Trials I Am Using Up The Api Calls Very Quickly Though. Is There Some Way To Limit That? The Documentation Is A Bit Sparse On What Uses How Many Api Calls. Is It Possible To Batch Them For

Hi FlutteringWorm14

Is there some way to limit that?

What do you mean by that? are you referring to the Free tier ?

2 years ago

0 Hello, When Running A Task With A Remote Interpreter I Get

BTW: latest PyCharm plugin with 2022 support was just released:
https://github.com/allegroai/clearml-pycharm-plugin/releases/tag/1.1.0

2 years ago

0 I'M Trying To Configure The Glue Agent To Use Aws Ecr Via Helm Charts. Below Is My Configuration. It Is Not Pulling The Image Though, It Is Failing With

I cannot test it at the moment, hence my question.
JuicyFox94 any chance you can blindly approve ?

2 years ago

0 Hi Guys, How Does Allegro Keep Track Of The Requirements (I'M Running The Scripts On A Remote Train-Agent With

if in the "installed packages" I have all the packages installed from the requirements.txt than I guess I can clone it and use "installed packages"

After the agent finished installing the "requirements.txt" it will put back the entire "pip freeze" into the "installed packages", this means that later we will be able to fully reproduce the working environment, even if packages change (which will eventually happen as we cannot expect everyone to constantly freeze versions)

My problem...

4 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

not sure what is the "right way" 🙂
But I do pkill -f "trains-agent --gpus 0" This will kill a process that started "trains-agent --gpus 0" Notice it matches the cmd pattern so it has to match the way you executed the agent. You can check it with ps -Af | grep trains-agent

5 years ago

0 Hello! When Trying To Use Clearml Datasets With Google Cloud Storage With The Authorized User Credentials It Will Fail And Say Some Fields Are Missing From The Json. This Isn'T An Issue If The User Is Using A Service Account Json Key, Is A Service Account

Thanks ShortElephant92 ! PR looks good, I'll ask the guts to take a look

2 years ago

0 Hi. I Get Some Problem With Clearml Agent. I Start Training On My Local Device, Clone Run, And Start This Run In Docker On Cluster. But, Seems Like Clearml Agent Сaches Environment(Package Weels, Python Version, Etc). Can I Config Clearml Agent To Not Сac

I'm running agent inside docker.

So this means venv mode...

Unfortunately, right now I can not attach the logs, I will attach them a little later.

No worries, feel free to DM them if you feel this is to much to post them here

3 years ago

0 Hey, Is There A Shortcut On The Dataset Sdk To Directly Get The Latest Version Of A Dataset ?

Hi FierceHamster54
Sure just do
dataset = Dataset.get(dataset_project="project", dataset_name="name")This will by default fetch the latest version

2 years ago

0 Hi, I Am New Here, Can I Ask Question On Trains-Server Also?

The cool thing of using the trains-agent, you can change any experiment parameters and automate the process, so you get hyper-parameter optimization out of the box, and you can build complicated pipelines
https://github.com/allegroai/trains/tree/master/examples/optimization/hyper-parameter-optimization
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py

5 years ago

0 I Am Moving A Code Using Clearml Python Library To Use It'S Api As Docker Container Image. Is There Any Alternate To Use Access, Secret Keys Instead I Copy Clearml.Conf In Dockerfile?

Fully automatic, just have them defined and Task.init and everything else will work out of the box.
Notice the Env will override clearml.conf, so you can have clearml.conf with other default values inside the container, and have the Env override the definition
(not to worry, it is Not a must to have clearml.conf , it's just a nice way to add default values)

2 years ago

0 Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

Hi JitteryCoyote63
I think there is a GitHub issue (request on it), this is not very trivial to build (basically you need the agent to first temporary pull the git, apply changes, build docker, remove temp build, and restart with the new image)
Any specific reason for not pushing a docker, or using the extra docker bash script on the Task itslef?

3 years ago

0 Hello! Tell Me Please, Is It Intended That Nan Values Are Converted To 0 When Logging? Upd: I See Nan In The Tensorboard, And 0 In Clearml. Upd2: Use V1.1.*

CheerfulGorilla72

upd: I see NAN in the tensorboard, and 0 in Clearml.

I have to admit, since NaN's are actually skipped in the graph, should we actually log them ?

3 years ago

0 Hey Clearml Community! Quick Question About Plots - We'Re Trying To Draw A Reliability/Calibration Plot, We Want To Make It Square As Seen In The First Picture Since It Makes The Visual Analysis Of It Much Easier, But Clearml 'Insists' On Squishing It Dow

Hi GloriousPenguin2 , Sorry this is a bit confusing. Let me expand:
When converting into a plotly object (the default), you cannot really control the dimensions of the plot in the UI programatically, you can however drag the seperator and expand width / height If you pass to report_matplotlib_figure the argument " report_image=True, " it will create a static image from matplotlib figure (as rendered locally) and use that as the figure, this way you get exactly wysiwyg , but the...

3 years ago

0 How Can I Modify The Line Executed By The Agent At The Beginning

Agreed, MotionlessCoral18 could you open a feature request on the clearml-agent repo please? (I really do not want this feature to get lost, and I'm with you on the importance, lets' make sure we have it configured from the outside)

3 years ago

0 Hey All

. And I saw that it upload the notebook it self as notebook. Does it is normal? There is a way to disable it?

Hi FriendlyElk26
Yes this is normal, it backups your notebook as well as converts it into python code (see "Execution - uncommitted changes" so that later the clearml-agent will be able to run it for you on remote machines.
You can also use task.connect({"param": "value") to expose arguments to use in the notebook so that later you will be able to change them from the U...

2 years ago

0 Hi! In "Parallel Coordinates" View, Is There An Option To "Tilt" The Strings A Bit? It'S Currently Impossible To Understand Anything When There Are Multiple Hyperparameters Viewed And Some Have More Then Super Short Strings. Example Of How It Can Look (Se

Thanks GorgeousMole24
That is a very good point! passing to product guys

2 years ago

0 Hi I’M Trying Out Pipeline Controller From Tasks. I Was Not Able To Understand Why My Code Results In Just One Task(The First One) In The Pipeline.

UpsetBlackbird87
pipeline.start()Will launch the pipeline itself On a remote machine (a machine running the services agent).
This is why your pipeline is "stuck" it is not actually running.
When you call start_lcoally() the pipeline logic itself is runnign on your machine and the nodes are running on the workers.
Makes sense ?

3 years ago

0 Avoiding

Be able to trigger the “pure” function (e.g. train()) locally, without any

code running, while driving it from a configuration e.g. path to the data.

When you say " without any http://clear.ml code" do mean without the agent, or without using the Clearml.Dataset ?

Be able to trigger the “

decorator” (e.g. train_clearml()) while driving it from configuration e.g. dataset_id

Hmm I can think of:
` def train_clearml(local_folder=None, dataset_id=None):
...

3 years ago

0 After Trying To Execute A Task From The Queue The Agent Fails Installing The Environment:

That is odd, can you send the full Task log? (Maybe some oddity with conda/pip ?!)

3 years ago

0 Hi, I Am Running A File Like This

Hi DeliciousBluewhale87
You can achieve the same results programmatically with Task.create
https://github.com/allegroai/clearml/blob/d531b508cbe4f460fac71b4a9a1701086e7b6329/clearml/task.py#L619

4 years ago

0 Hi! Is There A Way To Export The Credentials Of The Aws Account Only During The Creation Of The Docker? I Don’T Want Every User In My Team To Know The Credentials To Access S3 Buckets. I Just Want Them To Be Able To Write In The Bucket Without The Credent

…every user in the server has the same credentials, and they don’t need to know them..makes sense?

Make sense, single credentials for everyone, without the need to distribute
Is that correct?

3 years ago

0 What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

I am actually saving a dictionary that contains the model as a value (+ training datasets)

How are you specifically doing that? pickle?

2 years ago

Because it lives behind a VPN and github workers don’t have access to it

makes sense
If this is the case, I have to admit that combining offline-mode and remote execution makes sense, no?

3 years ago

0 Hello, What Is The Default Limit For Global Context ?

100 entries:
https://github.com/allegroai/trains/blob/838c9cb0d2a5df5c193dfc85286abe59a80217c2/trains/storage/cache.py#L15

5 years ago

0 Hi, I'M Trying To Upload My Dataset Via

Hi ZippySheep23

Any ideas what might be happening?

I think you passed the upload limit (2.36 GB) 🙂

4 years ago

0 Question About

That sounds about right to me 🙂

4 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

So that agent on different nodes will probably require different cuda-version images.

That makes sense SarcasticSquirrel56
I would edit the helm chart (or deploy manually) based on a selector that will select the different nodes/gpus and assign the correct containers (i.e. matching CUDA versions to the diff GPUs / drivers)
BTW: you can also playaround with k8s glue, which would dynamically spin pods based on clearml Tasks.
wdyt?

3 years ago

Show more results