Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
48 Questions, 8049 Answers
  Active since 10 January 2023
  Last activity 5 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

That didn’t gave useful infos, was that docker was not installed in the agent machine x)

JitteryCoyote63 you mean "docker" was not installed and it did not throw an error ?

3 years ago
0 Was There Ever A Solution To This Request?

Hi @<1730033904972206080:profile|FantasticSeaurchin8>
You mean in the UI , or when reporting on the SDK?

one month ago
0 Hello Everyone ! When I Run My Python Script Localy , Everything Works Fine (It Includes Tensorflow). When I Try To Run It Remotely From App.Clearml I Observe A Weird Error That My Requirement Cannot Be Filled. Adding The Logs. Please If Someone Could Hel

Hi ExasperatedCrocodile76
It seems like it is using conda package manager, were you using conda when you run the code manually ?
ERROR: This cross-compiler package contains no program /home/ivan/miniconda3/envs/clearML/bin/x86_64-conda_cos6-linux-gnu-gfortranWhy is it trying to install from source code?
BTW: can you test with the latest agent RC? ( pip install clearml-agent==1.4.0rc4 )

one year ago
0 Hey, I'M Trying To Set Up A Clearml Server On Docker As Per Documentation. Everything Goes Well Until The Docker-Compose Up Step, That'S When I Get This Error; Error: Error Pulling Image Configuration: Download Failed After Attempts=6: X509: Certificate

WickedElephant66 this seems like a general network issue, like the docker service is missing your companies firewall certificate.
Can you pull any container from docker hub ?

2 years ago
0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Ok, but when 

nvcc

 is not available, the agent uses the output from 

nvidia-smi

 right? On one of my machine, 

nvcc

 is not installed and in the experiment logs of the agent runnin there, 

agent.cuda =

 is the version shown with 

nvidia-smi

Already added to the next agent's version 😉

3 years ago
0 Hi, Is There A Way To Pull Clearml Datasets To A Mounted Pv Instead Of The Pod'S Local Directory.

Hi @<1523701304709353472:profile|OddShrimp85>
Do you mean Dataset.get_local_copy() ?

one year ago
0 What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

I want the model to be stored in a way that clearml-serving can recognise it as a model

Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to dis...

one year ago
0 Hi, We Have A Use Case That We Would Like To Upload A Local Folder Into The Cloud

I think the main difference is that I can see a value of having access to the raw format within the cloud vendor and not only have it as an archive

I see it does make sense.
Two options, one, as you mentioned use the ClearML StorageManager to upload the files, then register them as external links with Dataset.
Two, I know the enterprise tier has HyperDatasets, that are essentially what you describe, with version control over the "metadata" and "raw storage" on the GCP, including the ab...

one year ago
0 Hi Again, I Am Trying To Execute A Pipeline Remotely, However I Am Running Into A Problem With The Steps That Require A Local Package. Basically I Have A Repo, That I Created Specifically For This Pipeline And I Have Packaged It So That I Can Split It I

I would just add git+ None to your requirements (either in the requirements.txt or even better as part of the pipeline/component where you also specify the repo to be used)
The agent will automatically push the crednetilas when it installs the repo as wheel.
wdyt?
btw: you might also get away with adding -e . into the requirements.txt (but you will need to test that one)

7 months ago
0 So, Here'S A Question. Does Clearml Automatically Save Everything Necessary To Continue Training A Pytorch Language Model? Specifically, I'Ve Been Looking At The Checkpoint Folders Created When I'M Training A Huggingface Robertaformaskedlm. I Checked What

If you cannot change the "TrainerState" (i.e. inherit and pass it into the code)
you cloud also monkey-patch it, something like
` class OurTrainerState(TrainerState):
def init(...)
...
def load_from_json(cls, json_path: str):
super().load_from_json(json_path))
Task.current_task().upload_artifact(...)

trainer.state = OurTrainerState(trainer.state) `

3 years ago
0 Hi All, I Am Trying To Deploy Clearml Server In A Local Machine. I Followed All The Steps In

nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done

2 months ago
0 Hi All. I Am Using The Recently Added Trainslogger In Pytorch-Lightning And Experiencing Incoherent Behavior With Model Checkpoint Upload. I Made An Issue On Pytorch-Lightning Github

Hi MelancholyBeetle72 , that's a very interesting case. I can totally understand how storing a model and then immediately renaming it breaks the upload. A few questions, is there a way for pytorch lightning not to rename the model? Also I wonder if this scenario happens a lot (storing model and changing it) . I think the best solution is for Trains to create a copy of the file and upload it in the background. That said the name will still end with .part What do you think?

4 years ago
0 Hello, If I Set

export CLEARML_DEFAULT_OUTPUT_URI="https://...."Make sense ?

3 years ago
0 Is There Any Api Reference? Somewhere In The Docs I Can See The Signature Of Methods/Classes And See What Arguments They Accept And Description? Before I'M Rushing To Ask Questions Here Myself, I'D Prefer To Do As Much Learning As I Can Through The Docs

Hi WackyRabbit7
First always check the functions on the Task object, they are the most straight forward access to the system.
Then if you need general purpose API calls, currently they are only documented in the doc-string of the API schema (that said it should be quite documented)
You can check all the endpoints https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
And finally if you want to easily use the RestAPI :
` from trains.backend_api.session.client impo...

4 years ago
0 For Remote Execution Where The Queue Has

No after, do you see the poetry lock removed in the uncommitted changes?

one year ago
0 Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running,

Wait I think I found it, since usuallyu the case with hydra you configure everything from overrides / config, when launched remotely it looks at it by default. But with the launch plugin it should be overwritten with the Task
` task = Task.init(...)
task.set_parameter(name="Hydra/_allow_omegaconf_ed...

2 years ago
0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

hmmm, somehow I have a bed feeling about it... Could you check the log, it should say something like "Collecting torch==1.6.0.dev20200421+cu101 from https://"
It should be right at the top of the installation. What do you have there?

4 years ago
0 Hi, I Tried To Provide Docker Image From Pipeline Controller Task To Step Task. Before Pipe.Add_Step(), I Created The Task:

Hi ApprehensiveFox95
I think this is what you are looking for:
step1 = Task.create( project_name='examples', task_name='pipeline step 1 dataset artifact', repo=' ` ',
working_directory='examples/pipeline',
script='step1_dataset_artifact.py',
docker='nvcr.io/nvidia/pytorch:20.11-py3'
).id

step2 = Task.create(
project_name='examples', task_name='pipeline step 2 process dataset',
repo=' ',
working_directory='examples/pipeline',
script='step2_data_pr...

3 years ago
0 I Assume I Can Ask A Question Here. The Clearml Orchestrator Looks Interesting. But The Website Suggests That K8S Is Required. We Have A Linux Training Box (Lambdabox) Where We Want To Run Training. Can We Place The Clearml Orchestrator Agent On The M

Hi RobustFlamingo1

The ClearML Orchestrator looks interesting. But the website suggests that K8S is required

No k8s is not a must, only an option 🙂

We have a Linux training box (LambdaBox) where we want to run training. Can we place the ClearML orchestrator agent on the machine without needing K8S?

Yes should be quite easy.
If you intent to use containers, make sure you have docker installed.
Then just pip install clearml-agent and configure it:
https://clear.ml/doc...

2 years ago
3 years ago
0 When My Remote Task Is Installing The Python Dependencies

/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-pyYep I see it now, could you simulate locally (i.e have the other folders in the path as well)?
could it be you also have a file somewhere that is called sfi or imagery or models or chip_classifier that it accidently tries to import first from ?

one year ago
0 Hi Guys, We Are Running Clearml-Serving On A Kube Cluster On Aws And We Have Noticed That We Are Getting Some 502 Errors Once In A While That We Can'T Seem To Trace Back.

Hi @<1569858449813016576:profile|JumpyRaven4>

  • The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.Is this an ALB or an ELB ?
    What's the timeout its configured?
    Do you have GPU instances as well? what's the clearml-serving-inference docker version ?
9 months ago
0 Dear Clearml Community, I Am Looking For A Way To Properly Resume A Training In A Way That Initial Scalars Get Reused And Expanded. Clearml Feature For Reusing The Same Task Works Fine (When Using

Oh I see, basically a UI feature.
I'm assuming this is not just changing the x-axis in the UI, but somehow store the x-axis as part of the scalars reported?

7 months ago
0 So, Here'S A Question. Does Clearml Automatically Save Everything Necessary To Continue Training A Pytorch Language Model? Specifically, I'Ve Been Looking At The Checkpoint Folders Created When I'M Training A Huggingface Robertaformaskedlm. I Checked What

Could I use "register artifact"

I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json every checkpoint:
Task.current_task().upload_artifact('trainer_state.json, name='state') `

3 years ago
Show more results compactanswers