AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

Why does ClearML hide the dataset task from the main WebUI?

Basically you have the details from the Dataset page, why should it be mixed with the others ?

If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some

.datasets

hidden sub-project.

This maybe a request for "Dataset" tab under project, why would you need the Dataset Task itself is the main question?

Not all dataset objects are equal, and perhap...

2 years ago

0 Hi Everyone, I Have Questions Related To Clearml-Serving.

If there is new issue will let you know in the new thread

Thanks! I would really like to understand what is the correct configuration

2 years ago

0 Hi All

This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff

store back the change, my_params assumed to be the content of the param file (text)

task.set_configuration_object(name=name, config_taxt=my_params) `

3 years ago

0 Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Well it is there, do you have it in your docker-compose as well?
https://github.com/allegroai/trains-server/blob/master/docker-compose.yml#L55

4 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

Nope - confirmed to be running on the OS's Python environment,

okay so bare metal root is definitely not recommended.
I'm not sure how/why it get's stuck though 😞
Any chance you can run the agent as non-root?
Also maybe preferred in docker mode, so it is easier for you to control the environment of the Task

2 months ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

Our server is deployed on a kube cluster. I'm not too clear on how Helm charts etc.

The only thing that I can think of is that something is not right the the load balancer on the server so maybe some requests coming from an instance on the cluster are blocked ...
Hmm, saying that aloud that actually could be?! Try to add the following line to the end of the clearml.conf on the machine running the agent:

api.http.default_method: "put"

2 months ago

Yes. Because my old

has never been resolved (though closed), we use the dataset object to upload e.g. local files needed for remote execution.

Ohh No I remember... following this line, can I assume these files are reused, i.e. this is not a "per instance" . I have to admit that I have a feeling this is a very unique usecase. and Maybe the "old" way Dataset were shown is better suited ?

No, I mean why does it show up in the task view (see attached image), forcing me to clic...

2 years ago

0 Hi, I Have Quite A Generic Question. Basically, I Am Picking Your Brains For Any Solution. Our Current Pipeline Has (Clearml-Data, Clearml And Seldon). We Were Looking For Some Workflow Orchestrator To Stitch Them Up. One Scenario:

we can add non-clearml code as a step in the pipeline controller.

Yes 🙂 , btw you can kind of already do that, with pre/post function callbacks (notice they are running from the same scope as the actual pipeline controller).
What exactly did you have in mind to put there ?

3 years ago

0 Hello! I Was Hoping I Could Get Some Debug Help. I'Ve Set Up A Clearml Pipeline Using The Pipelinecontroller, And When Running Through

Hi SteadySeagull18

However, it seems to be entirely hanging here in the "Running" state.

Did you set a an agent to listen to the "services" queue ?
Someone needs to run the pipeline logic itself, it is sometimes part of the clearml-server deployment but not a mist

one year ago

0 Hi! I Developed Clearml Pipeline As Python Package (

FlatOctopus65

In my local environment

pipeline_package

is installed in development mode

In order to install the package you need to specify the git repo of the package, this is how the pipeline would know where to bring it from.
Either install it locally with "pip install git+ https://github.com/ ...." or add tp the packages argument of the Pipeline wrapper packages = ["git+ https://github.com/ "] `
wdyt?

one year ago

0 Hi, I Noticed That When I Commit Changes And Not Push Them And Try To Run A Job I Am Getting

Hi @<1566596960691949568:profile|UpsetWalrus59>

just wondering - shouldn't the job still work if I didn't push the commit yet

How would that work? it does not know which commit to take? it would also fail on git diff apply, no?

one year ago

0 Hi There,

Hi @<1523701066867150848:profile|JitteryCoyote63>

I found a memory leak

in

Logger.report_matplotlib_figure

Are you sure this is not Matplotlib leak but the Logger's fault ? I'm trying to think how we could create such a mem leak
wdyt?

one year ago

0 Hi, We Are Having An Interesting Issue Here. We Serve Many Users And Each User Has Their Own Credentials In Accessing The Private Git Repo. We Can'T Seem To Find A Way For The End User To Pass In Their Git Credentials When They Run Their Codes In Both Age

Hi SubstantialElk6

We can't seem to find a way for the end user to pass in their git credentials when they run their codes in both agent and non-agent scenarios. Any advice here?

The bottom line is the agent needs to have read-only access to all the repositories so it can launch any Task. I would recommend to create an agent git user with read-only credentials and configure the agent to use it. wdyt?

3 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

How are you starting the agent?

2 months ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

Please let me know what you find 🤞

2 months ago

0 Hello, I Have The Following Scenario:

Can you try to set this in your clearml.conf:

agent.pip_download_cache.enabled: false

this should disable the local caching, of your wheel, I suspect there is some issue with the local cache file in windows...

one year ago

0 Hi, I Would Like To Follow-Up In This

So you mean 1.3.1 should fix this bug?

Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0

2 years ago

We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.

@<1546665634195050496:profile|SolidGoose91> I think that in order to have the flexibility there you need the "dynamic" GPU allocation that is only part of the "enterprise" offering 😞
That said, why not allocate a single a100 machine? no?

one year ago

0 Hello, Everyone! I Have A Question Regarding Clearml Features. We Run Into The Situation When Some Of The Agents That Are Working On A Hpo Die Due To Variable Reasons. Some Workers Go Offline Or Resources Need Temporarily Be Detached For Other Needs. Thu

We have tried to manually restart tasks reloading all the scalars from a dead task and loading latest saved torch model.

Hi ThickKitten19
how did you try to restart them ? how are you monitoring dying instances ? where . how they are running?

2 years ago

0 Hi, Can You Help Me Pls, I Got: Environment Setup Completed Successfully Starting Task Execution: Traceback (Most Recent Call Last): File "Agro_Api.Py", Line 13, In From Help_Models.Consts Import Urls Importerror: No Module Named 'Help_Models'

"what's the trains/trains-agent/trains-server versions ?" how can I check it?

trains/trains-agent are pip packages os,
pip freeze | grep trains
trains-server you can check in the /profile page top left corner

4 years ago

0 Hello, We Have A Self Hosted Clearml Server Connected To Different Queues And Use It To Launch Remote Experiments (Clearml==1.9.3, Clearml-Agent==1.5.2Rc0). It Is Working Really Well For Us Unless One Workflow :) We Would Like To Abort An Experiment And E

I had again the same problem but within a remote pipeline setup.

Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?

one year ago

0 When Running Jobs, My Pipeline Controller Always Updates To The Latest Git Commit Id But Sometimes My Pipeline Steps Do Not. This Appears To Be Somewhat Random So I Believe It Is Due To Caching. Has Anyone Else Encountered This Or Have Any Idea How To Fix

Notice there is no need to upgrade the server, only the ClearML python package

3 years ago

Woot woot, will do!

one year ago

0 Hi, Another Question If You May. Is It Possible To Edit A Logged Task? For Instance - Remove All The Metrics From Some Step Onward?

I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')

get the second from last checkpoint

task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()

do some fpr loop and report the prev_scalars with logger.report_scalars

new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)

start the train `

3 years ago

0 Hello

Yeah that sounds like it, let me check if we can quickly reproduce

one year ago

The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).

Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?

2 years ago

okay that makes sense, if this is the case I would just use clearml-agent execute --id <task_id here> to continue the training Task.
Do notice you have to reload your last chekcpoint from the Task's models/artifacts to continue 🙂
Last question, what is the HPO optimization algorithm, is it just grid/random search or optuna hbop/optuna, if this is the later, how do make it "continue" ?

one year ago

0 Hi I Am Encountering Some Difficulties While Trying To Run The Examples Of The Clearml Documenation (E.G.

Hi FreshBat85
clearml_agent: ERROR: 'utf-8' codec can't decode byte 0xfc in position 38: invalid start byteThis is a notorious issue with python and UTF-8/Unicode support.
Any chance there is "unicode"/utf8 code in the uncommitted changes section ?

BTW you can set an environment variable before spinning the agent, telling it always to use UTF8
set PYTHONUTF8=1

2 years ago

0 Hello, I Have The Following Basic Snippet Where I'M Trying To Add Another Value To The Task'S Connected Arguments After Calling

Hi GiganticTurtle0
You can keep clearml following the dictionary auto updating the UI
args = task.connect(args)

3 years ago

0 Hi All! I Have A Couple Of Things That Are Not Completely Clear To Me, Hope You Can Help Me To Sort Them Out.

Thanks!

3 years ago

Show more results