Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
48 Questions, 8051 Answers
  Active since 10 January 2023
  Last activity 8 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Hi Guys, I Have Been Running The Clearml-Serving For A While Now And I Realize That From Time To Time After A Couple Of Hours The Serving Task (Control Plane) That Is Configured Through The Cli Goes Into Status Abort. This Happens Even Though All The Pods

how can you be snyk and lower than 0.96

Yep Snyk auto "patching" is great 馃檪
as I mentioned wait for the GH sync tomorrow, a few more things are missing there
In the meantime you can just do ">= 0.109.1"

10 months ago
0 Hi Guys, I Have Been Running The Clearml-Serving For A While Now And I Realize That From Time To Time After A Couple Of Hours The Serving Task (Control Plane) That Is Configured Through The Cli Goes Into Status Abort. This Happens Even Though All The Pods

Hi @<1569858449813016576:profile|JumpyRaven4>
What's the clearml-serving version you are running ?

This happens even though all the pods are healthy and the endpoints are processing correctly.

The serving pods are supposed to ping "I'm alive" and that should verify the serving control plan is alive.
Could it be no requests are being served ?

10 months ago
0 Hi Guys, I Have Been Running The Clearml-Serving For A While Now And I Realize That From Time To Time After A Couple Of Hours The Serving Task (Control Plane) That Is Configured Through The Cli Goes Into Status Abort. This Happens Even Though All The Pods

no requests are being served as in there is no traffic indeed

It might be that it only pings when requests are served

what is actually setting the task status to

Aborted

?

server watchdog, basically saying, no one is pinging "I'm alive" on this "Task" I should abort it

my understanding was that the deamon thread was deserializing the task of the control plane every 300 seconds by default

Yeah.. let me check that
Basically this sounds like a sort of a bug,...

10 months ago
0 I Have Set

"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..

7 months ago
0 I Have Set

what if the preexisting venv is just the system python? my base image is python:3.10.10 and i just pip install all requirements in that image. Does that not avoid venv still?

it will basically create a new venv inside the container forking the existing preinistalled stuff (i.e. the new venv already has everything the python system has preinstalled)
then it will call "pip install" on all the "installed packages of the Task.
Which should just check everything is there and install nothing...

7 months ago
0 I Have Set

would those containers best be started from something in services mode?

Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, with containers that have git/python/clearml-agent preinstalled it should be minimal.

or is it possible to get no-overhead with my approach of worker-inside-docker?

No do not do that, see above e...

7 months ago
0 I Have Set

Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?

7 months ago
0 I Have Set
  • try with the latest RC 1.8.1rc2

, it feels like after git clone, it spend minutes without outputting anything

yeah that is odd , can you run the agent with --debug (add before the daemon command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see

7 months ago
0 Hi. I Get Some Problem With Clearml Agent. I Start Training On My Local Device, Clone Run, And Start This Run In Docker On Cluster. But, Seems Like Clearml Agent 小aches Environment(Package Weels, Python Version, Etc). Can I Config Clearml Agent To Not 小ac

StickyBlackbird93 the agent is supposed to solve for the correct version of pytorch based on the Cuda in the container. Sounds like for some reason it fails? Can you provide the log of the Task that failed? Are you running the agent in docker-mode , or inside a docker?

2 years ago
0 Hi. I Get Some Problem With Clearml Agent. I Start Training On My Local Device, Clone Run, And Start This Run In Docker On Cluster. But, Seems Like Clearml Agent 小aches Environment(Package Weels, Python Version, Etc). Can I Config Clearml Agent To Not 小ac

I'm running agent inside docker.

So this means venv mode...

Unfortunately, right now I can not attach the logs, I will attach them a little later.

No worries, feel free to DM them if you feel this is to much to post them here

2 years ago
0 Hi. I Get Some Problem With Clearml Agent. I Start Training On My Local Device, Clone Run, And Start This Run In Docker On Cluster. But, Seems Like Clearml Agent 小aches Environment(Package Weels, Python Version, Etc). Can I Config Clearml Agent To Not 小ac

Hi StickyBlackbird93
Yes, this agent version is rather old ( clearml_agent v1.0.0 )
it had a bug where pytorch wheel aaarch broke the agent (by default the agent in docker mode, will use the latest stable version, but not in venv mode)
Basically upgrade to the latest clearml-agent version it should solve the issue:
pip3 install -U clearml-agemnt==1.2.3BTW for future debugging, this is the interesting part of the log (Notice it is looking for the correct pytorch based on the auto de...

2 years ago
0 I’M Wondering If Someone Has An Example Of How To Use The

Hi @<1533620191232004096:profile|NuttyLobster9>
base_task_factory is a function that gets the node definition and returns a Task to be enqueued ,
pseudo code looks like:

def my_node_task_factory(node: PipelineController.Node) -> Task:
  task = Task.create(...)
  return task

Make sense ?

one year ago
0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)

.. note::
    When continuing the executing of a previously executed Task,
    all previous artifacts / models/ logs are intact.
    New logs will continue iteration/step based on the previous-execution maximum iteration value.
    For example:
    The last train/loss scalar reported was iteration 100, the next report will b...
2 years ago
0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)

2 years ago
0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Hi VivaciousWalrus21

After restarting training huge gaps appear in iteration axis (see the screenshot).

The Task.init actually tries to understand what was the last reported interation and continue from that iteration, I'm assuming that what happens is that your code does that also, which creates a "double shift" that you see as the jump. I think the next version will try to be "smarter" about it, and detect this double gap.
In the meantime, you can do:
` task = Task.init(...)...

2 years ago
0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Expected behaviour is that it reads last iteration correctly. At least it is stated in docs so.

This is exactly what should happen, are you saying that for some reason it fails?

2 years ago
0 Hi, I Have A Question About

SoggyFrog26 you'll have it in the next RC 馃檪
Not sure what's the plan I know one should be out today/tomorrow, worst case on the next one 馃檪

3 years ago
0 Hi, I Have A Question About

Hi SoggyFrog26
Yes, it is stored at ~/.clearml_data.json
Notice you can always change it by passing --id dataset_id

3 years ago
0 Hi, I Have A Question About

SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner 馃檪

3 years ago
0 Hi, I Have A Question About

I think it would be nicer if the CLI had a subcommand to show the content of聽

~/.clearml_data.json

聽.

Actually, it only stores the last dataset id at the moment, no not much 馃檪
But maybe we should have a cmd line that just outputs the current datasetid, this means it will be easier to grab and pipe
WDYT?

3 years ago
0 Hello! Since Today I Get

okay, I'll make sure we order it correctly

3 years ago
0 Is There A Way To Set Access Levels Per-User On The Trains Web App? (I'M Basically Looking To Add A Readonly User Role)

Not really 馃槥
Everyone can do everything, the idea is sharability and accessibility.
I do know that in the paid tier they have full access control roles SSO etc, but unfortunately its way too complicated for the open-source.
Basically what I'm saying is trust your fellow colleagues 馃檪

4 years ago
Show more results compactanswers