AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 Hi, Is There Any Way To Upload Data To A Clearml Dataset Without Compression At All? I Have Very Small Text Files That Make Up A Dataset And Compression Seems To Take Most Of The Upload Time And It Provide Almost No Benefits W.R.T Size

BTW:

I have very small text files that make up a dataset and compression seems to take most of the upload time

How long does it take? and how come it is not smaller in size ?

2 years ago

0 When I Do

is this a config file on your side or something I can change, if we had enterprise version?

Yes, this is one of the things you can configure

2 years ago

0 Anyone Here With Any Idea Why My Service Tasks Get Aborted When Going To Sleep?

Hi @<1523701868901961728:profile|ReassuredTiger98>

Anyone here with any idea why my service tasks get aborted when going to sleep?

I think I understand the issue, clearml==1.4.0 try running with the latest clearml (1.10.x)
It will keep pinging the backend "Im alive" so the backend does not think this process is dead (which I suspect what happened, and after 2 hours the backend basically set the Task to aborted because it "thought" it was killed)

2 years ago

0 Hi, We Have A Use Case That We Would Like To Upload A Local Folder Into The Cloud

OutrageousSheep60 before I can answer, maybe you can explain why "zipping" them does not fit your workfow ?

2 years ago

0 Hi Everyone, I'M Getting Really Weird Issue With Clearml Installation. Basically, I Have An Environment Where I Already Configured Clearml Via Clearml-Init, But Inside Code I'M Running On Slurm My Task.Init() Raises "Missingconfigerror". Worth Mentioning,

Nice 👍

one year ago

0 Has Anyone Successfully Deployed Clearml On A Kube Cluster Utilizing Istio? I Don’T See Any Mention Of Istio In The Docs.

Hmm SuccessfulKoala55 any chance the nginx http was pushed to v1.1 on the latest cloud helm chart?

4 years ago

0 Hello there! ~I've come to bargain!~ So, I noticed that with the REST API at least the `/tasks.get_all` endpoint appears to have an undocumented maximum page size of 500. The minimum page size it says right there, but at least when fetching it through th

Hi @<1724235687256920064:profile|LonelyFly9>

So, I noticed that with the REST API at least the

/tasks.get_all

endpoint appears to have an undocumented maximum page size of 500.

Yeah otherwise the request size might be too big, but you have pagination:

page
optional	Page number, returns a specific page out of the resulting list of tasks
Minimum value : 0	integer

11 months ago

0 Hello Everyone! Is It Possible To Deactivate Package Analysis For Remote Execution? I Run My Code With Clearml-Agent In Docker Mode With Nvidia:Pytorch Container. When Clearml Is Running Inside The Docker The Installed Packages Of The Webui Get Updated. H

Sure thing :)

4 years ago

0 I Have Code That Does Torch.Load(Path) And Deserializes A Model. I Am Performing This In Package A.B.C, And The Model’S Module Is Available In In A.B.C.Model Unfortunately, The Model Was Serialized With A Different Module Structure - It Was Originally Pla

Hi RoughTiger69

unfortunately, the model was serialized with a different module structure - it was originally placed in a (root) module called

model

....

Is this like a pickle issue?

Unfortunately, this doesn’t work inside clear.ml since there is some mechanism that overrides the import mechanism using

import_bind

.

__patched_import3

What error are you getting? (meaning why isn't it working)

3 years ago

0 Hello Everyone, I Have A Question Regarding Datasets. I Writing A Python Script Where It Takes As Inputs A Project Name And Returns All Datasets That Exist Within That Project. I Am Using

Hi @<1547028031053238272:profile|MassiveGoldfish6>

The issue I am running into is that this command does not give me the dataset version number that shows up in the UI.

Oh no, I think you are correct, it will not return the version per dataset 😞 (I will make sure we add it)
But with the dataset ID you can grab all the properties:
Dataset.get(dataset_id="aabbcc").version
wdyt

2 years ago

0 Hi Guys! Love Using Trains And Love The Great Support In This Channel. Say I Have Two Different Training Experiments Which Report Every 20 Iteration, But The Batch Size Between Them Is Different, Resulting In Different Number Of Iterations Per Epoch. I Wo

yes 😞

5 years ago

0 Is It Possible To Link Independent Training Experiments.. For Example.. I Have An Ensemble Of 2 Models (A & B) Each Models Are Trained Under Their Own Training Task In Trains Now I Will Run Another Script Which Will Use These Models To Create An Ensemble

Month +- 🙂

5 years ago

0 There Is Some Specificity With The Way We Setup Our Environment At My Company That Prevents Me From Using The Full Features Of

I want to inject a bash command after the repo has been clone (and maybe even after the venv has been installed).

LazyTurkey38 the created venv inherits from the system environment, so in theory you can do all the installation on the system python and the created venv will just inherit the packages, no?
(btw: just to clarify, there is only one entry point for the custom bash script and that is before everything, so users can configure the container before the agent starts)

4 years ago

0 Hi Everyone And Thanks Again For The Help, I Still Have No Success In Running Clearml Agent, It Just Gets Stuck Without Any Output, On Debug Mode For

yes i can communicate with the server, i managed to put tasks in the queue and retrieve them as well as running tasks with metrics reporting

Through the UI or python code ?

3 years ago

0 Hi! I Am Setting Up Clearml Server With Web Authentication. As Far As I Understand, Users Use Logins And Passwords Specified In Config/Apiserver.Conf To Access Webserver Ui And Key/Secret Key From Their Local ~/Clearml.Conf To Access Apiserver. What Is Th

Hi UnevenHorse85

As far as I understand, users use logins and passwords specified in config/apiserver.conf to access webserver UI and key/secret key from their local ~/clearml.conf to access apiserver.

Correct 🙂

access apiserver. What is the use of all other security keys

To be able to configure the SDK client (i.e. clearml package) from OS environment and not clearml.conf file

4 years ago

0 Is It Possible To Set An Environment Variable For A Task?

if project_name is None and Task.current_task() is not None: project_name = Task.current_task().get_project_name()This should have fixed it, no?

4 years ago

Hi PompousParrot44
So do you mean something like:
` task_model_a = Task.get('id_a')
task_model_b = Task.get('id_b')

model_a_file = task_model_a.models['output][-1].get_local_copy()

model_b_file = task_model_b.models['output][-1].get_local_copy() `

5 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)

4 years ago

0 From Datetime Import Datetime Import Hashlib From Clearml Import Task Previous_Timestamp = 0 Task_Filter = {} Task_Filter.Update( { 'Page_Size': 100, 'Page': 0, 'Status_Changed': ['>{}'.Format(Datetime.Utcfromtimestamp(Previou

And is there an easy way to get all the metrics associated with a project?

Metrics are per Task, but you can get the min/max/last of all the tasks in a project. Is that it?

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

I might gave an idea, could you test with:
` from clearml import Task
Task._report_subprocess_enabled = False

...

real code here `

4 years ago

0 Hi, Expanding On

Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...

4 years ago

0 Clearml Pipelines Can Be Build From Tasks, Functions, And Decorated Functions, According To The Examples In

Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?

Yes this repo is downloaded into the agent, so your code has access to it

2 years ago

0 Hey, How Can I Add A Private Key In Order To Let The Clearml Agent To Clone From A Private Git Repository?

You mean to add the extra index url?
you could use :
https://github.com/allegroai/clearml-agent/blob/5f0d51d485629e9dfc2d826622524461e3fcae8a/docs/clearml.conf#L63

4 years ago

0 Hi Everyone. I Have An Issue With The Simple Pipeline - It Runs Two Similar Nn Training Steps (Tf2.3, Windows10, Python 3.7) With Only Difference Is A Batch Size. I'M Running First Separately Each Step To Have Them In Clearml Project Page. Then I Run Pipe

BattyLion34
Maybe something inside the task is different?!
Could you run these lines and send me the result:
from clearml import Task print(Task.get_task(task_id='failing task id').export_task()) print(Task.get_task(task_id='working task id').export_task())

4 years ago

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

Okay I found the issue ( I think),
If the images are reported very quickly, it will "decide" you are about to override the previous one (i.e. 101 -> overwriting 0, which makes sense, the bug was it would disable the 101 from uploading and not the 0 🙂 )
Test fix:
in /backend_interface/metrics/events.py , line 292, change:
` last_count = self._get_metric_count(self.metric, self.variant, next=False)
if abs(self._count - last_count) > int(self._file_history_size):
...

3 years ago

0 Hello, I'M Using Trains For Logging My Training Script. However, While Using The Logger I'M Getting This: Trains.Task - Warning - ### Task Stopped - User Aborted - Status Changed ### And Eventually The Process Is Killed. If I Disable The Logger, The Proc

SoreDragonfly16 notice that if in the web UI you aborting a task it will do exactly what you described, print a message and quit the process. Any chance someone did that?

5 years ago

0 Hi! Is There Something Happening With The

ModelCheckpoint('best_model', save_best_only=True)That worked for me now, what's the diff

4 years ago

0 Does The New 2.0 Helm Charts (App Ver 1.1.0) Not Support Nfs?

I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R

4 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

Hi DilapidatedDucks58 ,
Are you running in docker or venv mode?
Do the works share a folder on the host machine?
It might be syncing issue (not directly related to the trains-agent but to the facts you have 4 processes trying to simultaneously access the same resource)

BTW: the next trains-agent RC will have a flag (default off) for torch-nightly repository support 🙂

5 years ago

0 Sorry Folks Too Many Questions - If I Have A Project (And I Set The Output Uri In It While Creating, To A S3 Folder) How Can I Ensure That A Experiment (Task) That I Run On My Local Outputs The Model To The Uri?

👍

4 years ago

Show more results