AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Is There A Document Which Describes What Kind Of Data Is Stored In Elasticsearch, Mongodb And Redis.. Just Trying To Understand The Architecture Of Trains And See How It Fit Together

PompousParrot44 unfortunately not yet 😞
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?

5 years ago

0 Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

JitteryCoyote63 you mean in runtime where the agent is installing? I'm not sure I fully understand the use case?!

2 years ago

0 How Do People Solve This? If I Am Pip Installing A Custom Package From .Tar.Gz, How Can I Ensure That If I Run The Experiment (Initially Run From A Notebook) Via The Queueing It Can Be Properly Installed Steps - Notebook -> Get A Tar.Gz From S3 -> Pip I

If i were to push the private package to, say artifactory, is it possible to use that do the install?

Yes that's the recommended way 🙂
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65

4 years ago

0 What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

I didn't realise that pickling is what triggers clearml to pick it up.

No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)

2 years ago

0 Hi, I'M Having Trouble Using Task.Clone And Task.Create- I'M Running Two Experiments One After The Other, And I Would Like To Report The Second Experiment To A New Task (New Experiment On The Server) But It Doesn'T Work. The Flow Is Task.Init -> Experimen

BTW: what's the use case? Why do you need to open two Tasks in the same code/script ?

5 years ago

0 Hi There, Congrats For Releasing V1

Can't say I have noticed that, is this a delay on the send ? Which for some reason is correlated with the epochs ? What was the case with 0.17.5?

4 years ago

0 Hi, I Assume It Is Very Basic But How Can I Add The Model That Is Created In The Training To The Artifacts And To See It In The Models Tab?

Your code should have worked, i.e. you should see the 'model.h5' in the artifacts tab. What do you have there?
It should look something like this one:
https://demoapp.trains.allegro.ai/projects/531785e122644ca5b85b2e19b0321def/experiments/e185cf31b2634e95abc7f9fbdef60e0f/artifacts/output-model

BTW:
To manually register any model:

from trains import Task, OutputModel task = Task.init('examples', 'my model') OutputModel().update_weights('my_best_model.h5')

5 years ago

0 Hi There! Is There An Easy Way To Retrieve The Site-Package Directory That Was Created By An Agent From Inside A Task? Eg.

I want that last python program to be executed with the environment that was created by the agent for this specific task

Well basically they all inherit the Python environment that points to the venv they started from, so at least in theory it should be transparent when the agent is spinning the initial process.

I eventually found a different way of achieving what I needed

Now I'm curious, what did you end up doing ?

2 years ago

0 When I Do

So this is an additional config file with enterprise?

Extension to the "clearml.conf" capabilities

Is this new config file deployable via helm charts?

Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry 😞 )

2 years ago

0 Hey I Use Allegro With Docker Mode. But I Do Not Have Access To Paths Where The Data Are(Data I Use For Training). How Can I Use "Volume Mount" With Allegro?

Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133

4 years ago

0 I Have A Question Regarding Reducing Execution Time Of Pulling Results From The Server With The Python Api. As Part Of Some Pipeline, After Running Hpo I Am Pulling All The Results From My Optimizer Task And Also Pulling All The Scalars Associated With Th

You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)

3 years ago

or creating a dedicated function I would suggest also including the actual sampled point in the HP space.

Could you expand ?

This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters

Does this relates to:
https://github.com/allegroai/clearml/issues/430

manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of t...

3 years ago

Sounds good to me. DepressedChimpanzee34 any chance you can add a github feature request, so we do not forget to add it?

3 years ago

I pull all the parameters, and then manually filter on the HP keys (manually=I have to plug them in, they are not part of optimizer object)

So is this an improvement to optimizer._get_child_tasks_ids(...) interface ?
e.g. return a structure like:
[ { 'id': task_id, 'hp1': value, 'hp2': value, 'hp3': value, 'objective': dict(title='title', series='series', value=42 }, ]

3 years ago

Hmm check if this one works:
optimizer._get_child_tasks_ids( parent_task_id=optimizer._job_parent_id or optimizer._base_task_id, order_by=optimizer._objective_metric._get_last_metrics_encode_field(), additional_filters={'page_size': int(top_k), 'page': 0})If it does, let's PR it as a dedicated function

3 years ago

DepressedChimpanzee34 something along the lines of:
from multiprocessing.pool import ThreadPool p = ThreadPool() def get_last_metric(t): return t.get_last_scalar_metrics() task_scalars_list = p.map(get_last_metric, top_tasks) p.close()We parallelized network connection as I'm assuming the delay is fetching

3 years ago

this?
ids = [t.id for t in top_task]

3 years ago

0 Hi All, I'M Trying To Use The Relatively New Jupyter Preview Feature But For Some Reason I Have The Notebook Artifact Under Artifacts But The Preview Is Unavailable.. Am I Missing Some Needed Steps? Thanks!

Hi RipeGoose2
Just to clarify, the issue with the html stuck in cache is a UI, thing, basically the webapp needs to tell the browser not to cache the artifacts, it has nothing to do with how the artifacts are created.
Regardless we love improvements so feel free to mass around with the code and PR once you get something useful 😉
Specifically this is where the html conversion happens
https://github.com/allegroai/clearml/blob/9d108d855f784e1fe7f5691d3b7bf3be64576218/clearml/backend_in...

4 years ago

0 Hi Everyone, Looking For Ml Management Tools I Stumbled Upon Trains, I Must Say It Has Been Awesome So Far. I Just Have A (Probably Stupid) Question: I'M Trying To Setup A Multi-Node Training Environment And I Thought I Could Solve This With Agents, But A

Hi SmilingFrog76
Great question, sadly multi-node is never simple 🙂
Let's start with the basic, let's assume one worker is available and the other is not, what would you want to happen? (p.s. I'm not aware of flexible multi-node training frameworks, i.e. a framework that can detect another node is available and connect with it mid training, that said, it might exist 🙂 )

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Is it being used to ssh to the instance?

It is used for the SSH client so it "knows" the SSH server (does that make sense) ?

3 years ago

0 Hey, I Have Many Python Files. In The First Python File I Use The Following Line. Parameters = Task.Connect(Input) Now I Change The Hyperparameters On The Graphical Interface. But Now I Need The Hyperparameters In Every Python File. How Do I Have Access T

No worries, let's assume we have:
base_params = dict( field1=dict(param1=123, param2='text'), field2=dict(param1=123, param2='text'), ... )Now let's just connect field1:
task.connect(base_params['field1'], name='field1')That's it 🙂

4 years ago

0 Hi There! I'M Trying To Understand How The

Hi SourSwallow36

The same docker image is used for all three jobs, just because it is easier to manage and faster to download. The full code is available on the trains-server GitHub. If you want to spin the containers manually, check the docker-compose.yml on the main repo, it has all the commands there
Fork the trains-server, commit the changes and don't forget to PR them ;)
Elastic search is a database, we use it to log all the experiments outputs, console logs metrics etc. This...

5 years ago

0 Hi, I Am Trying To Pull Api Data From /Tasks.Get_All Endpoint

You should have metric :monitor:gpu variant gpu_0_utilization
Since I see you have none of those, that points to no GPU driver ...
Could that be ?

2 years ago

0 Hello! How To Determine The Cache For An Agent In Kubernetes? I'M Going To Mount S3 As A Cache Folder As A Local Path Using S3Fs. What Variable Needs To Be Set In Values.Yaml For Agent Helm Chart?

Hi @<1578555761724755968:profile|GrievingKoala83>

mount s3 as a cache folder

I'm not sure that would be fast enough for cache ...

How to override

/root/.cache/pip

path?

in your clearml.conf fille:
None
then set it to your PV

one year ago

0 Hi, I Am Trying To Upload A Model But I Am Getting The Following Error:

Weird issue, I'll make sure we fix compatibility with python 3.9

4 years ago

0 Hello! Since Today I Get

Let me check

4 years ago

0 Hey, I'M Looking Into The Aws Autoscaler. I Couldn'T Find The Task In My Ui, So I Ran The

it is just local copy so you can rerun and reconfigure

4 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))Should be: <class 'trains.task.Task'>

5 years ago

0 Our Mac Users Are Having Some Issues. They Have Their Respective ~/Clearml.Conf, And Yet They Get: Clearml 1.1.5

Actually it cannot be differed, long story short when the agent is running the same code we have to verify and pass arguments at import time. I have to wonder, I'm expecting the env variables to be preset (I.e previously set for the entire environment) how come they are manually set inside the code (and wouldn't that break when running with an agent)?

3 years ago

0 Is There Any Reason Why Doing The Following Is Not Possible? Am I Doing It Right? I Want To Run A Pipeline With Different Parameters But I Get The Following Error?

Hey GiganticTurtle0 ,
So basically the issue is the the pipeline function ( prediction_service ) is getting a dict as input, and it is expecting to get basic types... if you were to do the following, it would have worked as expected.
prediction_service(**default_config)I will make sure we flatten any dictionary so that we end up with config/start , instead of a serialized version of the dict.
wdyt?

3 years ago

Show more results