Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
SubstantialElk6
Moderator
114 Questions, 310 Answers
  Active since 10 January 2023
  Last activity 4 months ago

Reputation

0

Badges 1

282 × Eureka!
0 Votes
1 Answers
235 Views
0 Votes 1 Answers 235 Views
Hi, what would happen if you have different clearml-agents of different versions running? Would it have any adverse effects?
2 years ago
0 Votes
3 Answers
309 Views
0 Votes 3 Answers 309 Views
Hi, i was adding data using clearml-data and get the following consistent errors. Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)...
one year ago
0 Votes
7 Answers
280 Views
0 Votes 7 Answers 280 Views
Hi, i am trying to use clearml-data to upload my data to S3, which is password protected. How should i indicate the credentials after i set --storage s3://.....
2 years ago
0 Votes
6 Answers
234 Views
0 Votes 6 Answers 234 Views
2 years ago
0 Votes
2 Answers
202 Views
0 Votes 2 Answers 202 Views
2 years ago
0 Votes
4 Answers
230 Views
0 Votes 4 Answers 230 Views
Hi, I'm running clearml agents via K8s glue. I noticed that the agent is not pulling latest images even though docker_force_pull is set to true. A kubectl de...
2 years ago
0 Votes
15 Answers
243 Views
0 Votes 15 Answers 243 Views
Hi, i noted that clearml-serving does not support Spacy models out of the box and that Clearml-Serving only supports following; Support Machine Learning Mode...
one year ago
0 Votes
1 Answers
215 Views
0 Votes 1 Answers 215 Views
Hi, i;m running ClearML jobs using K8SGlue. When the job is running, the scalar for monitor:machine seems to be reporting Node statistics instead of the Pod ...
6 months ago
0 Votes
0 Answers
204 Views
0 Votes 0 Answers 204 Views
2 years ago
0 Votes
5 Answers
265 Views
0 Votes 5 Answers 265 Views
one year ago
0 Votes
3 Answers
266 Views
0 Votes 3 Answers 266 Views
I'm getting this when running with Keras framework. clearml.storage - ERROR - Failed uploading: [Errno 21] Is a directory: 'model.savedmodel'.
2 years ago
0 Votes
1 Answers
228 Views
0 Votes 1 Answers 228 Views
Hi, can Clearml-Server support ReplicaSet in K8S?
one year ago
0 Votes
30 Answers
238 Views
0 Votes 30 Answers 238 Views
Hi, i'm getting this long error when running task.execute_remotely(queue_name="1gpu", exit_process=True) . I also notices an error Failed to fetching activit...
2 years ago
0 Votes
14 Answers
214 Views
0 Votes 14 Answers 214 Views
So i bumped onto this comparison shared by dagshub. It kinda placed ClearML is a rather bad position compared to everything else in the industry. https://dag...
2 years ago
0 Votes
5 Answers
203 Views
0 Votes 5 Answers 203 Views
Hi, how do I switch clearml server to run on https with a self signed cert?
2 years ago
0 Votes
6 Answers
283 Views
0 Votes 6 Answers 283 Views
2 years ago
0 Votes
5 Answers
269 Views
0 Votes 5 Answers 269 Views
Hi, i'm running the following and encountering some SSL errors. SSL_CERT_FILE=ca.crt clearml-data upload --id 12314jhg42342j4j --storage clearml.storage - ER...
2 years ago
0 Votes
10 Answers
267 Views
0 Votes 10 Answers 267 Views
Hi, v1 of agent seems to have removed agent.package_manager.force_repo_requirements_txt. Is this still available in other forms?
2 years ago
0 Votes
26 Answers
238 Views
0 Votes 26 Answers 238 Views
Hi, my DevSecOps team has raised some issues of us deploying ClearML for use. In particular, they are not happy with docker.sock configuration as it would po...
2 years ago
0 Votes
4 Answers
241 Views
0 Votes 4 Answers 241 Views
Hi, i am trying to understand clearml-data and only found this piece of article explaining it. https://github.com/allegroai/clearml/blob/master/docs/datasets...
2 years ago
0 Votes
4 Answers
241 Views
0 Votes 4 Answers 241 Views
Hi, i'm working on a post deployment data and model monitoring using ClearML. The idea is this. Use ClearML to serve my model out to Triton. Data MonitoringC...
one year ago
0 Votes
22 Answers
232 Views
0 Votes 22 Answers 232 Views
Hi, ClearML console leaks credentials passed in as Env Vars. The issue remains with clearml version==1.1.1.135 - 1.1.1 - 2.1.4 (As listed on the profile page...
2 years ago
0 Votes
23 Answers
225 Views
0 Votes 23 Answers 225 Views
Hi i saw this on the clearml-agent docs but other than the docker image, i'm not sure how to integrate this with clearml py and clearml-server. Please advise...
2 years ago
0 Votes
1 Answers
271 Views
0 Votes 1 Answers 271 Views
[Distributed Training] Hi, i have a ClearML setup with K8SGlue that spins up pods of 4 GPUs when picking tasks off the clearml queue. We would now want to pr...
7 months ago
0 Votes
1 Answers
107 Views
0 Votes 1 Answers 107 Views
Hi. For the experiment scalar tab, there's a gpu resource graph. The gpu mem used is in percentage, is it possible to display as absolute GB instead? Reason ...
3 months ago
0 Votes
30 Answers
253 Views
0 Votes 30 Answers 253 Views
one year ago
0 Votes
3 Answers
221 Views
0 Votes 3 Answers 221 Views
Hi, i have a docker image that needs to be run in privileged mode. How should i do the following? clearml-session: Pass the --privileged option along --docker ?
2 years ago
0 Votes
10 Answers
235 Views
0 Votes 10 Answers 235 Views
2 years ago
0 Votes
3 Answers
234 Views
0 Votes 3 Answers 234 Views
Just wondering, why aren't you guys getting yourselves known in GTC?
2 years ago
0 Votes
8 Answers
225 Views
0 Votes 8 Answers 225 Views
I just getting this in my agent run task. Would appreciate if someone can advise where i externalrequirement is pointing at. RequirementsManager handler rais...
2 years ago
Show more results questions
0 Hi, We Recently Upgraded Clearml To 1.1.1-135 . 1.1.1 . 2.14. The Task Init Is

Hi,
I'm running on Dell ECS storage appliance, which offers S3 compatibility.
yes http://ECS.ai is the DNS name of the server.
ClearML-models is the bucket.
Let me try with ip:port.

2 years ago
0 Hi, We Have Recurring Disk Space Issues On Our Clearml Server (Drop Of Many Gb In A Few Days). After Some Analysis, We Noted

Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.

one year ago
0 Hi, We Have Recurring Disk Space Issues On Our Clearml Server (Drop Of Many Gb In A Few Days). After Some Analysis, We Noted

ok thanks. this would mean that increasing the disk space for my ClearML is the only option as we are not at liberty to delete.

one year ago
0 Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

Yes it is! But ClearML didn't support multi node training out of the box in a way that it streamline the process. So we are trying to figure out a way to do it.

5 months ago
0 Hi, I Am Trying To Understand Clearml-Data And Only Found This Piece Of Article Explaining It.

Hi erez, i think i would want to reference the code that transformed the data. Take for example, i received 10k images, i performed some transformation and save it as a next version before i split it up for my ML training. Some time later, i receive a new set of 10k images and wants to apply the same transformation and then append it to the previous 10k as another version. Clearml-data does well for the data-versioning part, but in terms of data provenance, its not clear how i can associate t...

2 years ago
0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Hi.

We tried as advised above and it still didn't work.
Host: http://ecs.ai:443
output_uri = S3://ecs.ai:443/bucketname

This time round the client gave this error.
Botocore.exceptions.connectiinclosederror: connection was closed before we received a valid response from endpoint URL: ' http://ecs.ai/bucketname/.clearml.test '.

It's quite apparent that whatever clearml passed to boto3 ends up as a http call instead of https, which is wrong.

2 years ago
0 Hi, I Have A Future Roadmap Question On Clearml-Datasets. The Current Implementation Works Well For Small Datasets But Its Rather In Effective For Very Large Datasets. For Example, Let'S Say I Have 10 Million Images Just For The Training Dataset, And My T

Yes! I definitely think this is important, and hopefully we will see something there 

 (or at least in the docs)

Hi AgitatedDove14 , any updates in the docs to demonstrate this yet?

one year ago
0 Hi, I'Ve A Few Questions On Clearml-Session.

Hi AgitatedDove14 , thanks.
In this case i am running k8s glue (machine glue), which will then spawn off pods in kubernetes worker (machine worker). So when you say direct access, are you refering to the Glue machine or K8S Worker machine?

2 years ago
0 Hi, How Can I Pass A Env Variable To The Docker That'S Running The Agent When I Run This? I'M Havving Issues With The Agent'S Git Clone Where It Requires Sslverification To Be Disabled. Clearml-Agent Daemon --Gpus 0 --Queue Gpu --Docker --Foreground

After some churning, this is the answer. Change it in the clearml-agent init generated clearml.conf.

` default_docker: {
    # default docker image to use when running in docker mode
    image: "nvidia/cuda:10.1-runtime-ubuntu18.04"

    # optional arguments to pass to docker image
    # arguments: ["--ipc=host", ]
    arguments: ["--env GIT_SSL_NO_VERIFY=true",]
  } `

2 years ago
0 Hi, I'M Getting This Long Error When Running

[root@2c7498711bef elasticsearch]# curl `
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...

2 years ago
0 Hi, I'Ve A Few Questions On Clearml-Session.

Ok thanks, we'll try it out on next availability.

2 years ago
0 Hi, I'M Getting This Long Error When Running

[root@2c7498711bef elasticsearch]# curl `
{
"index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2021-05-22T11:33:38.932Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisi...

2 years ago
0 Hi, The `

ah thanks. Hopefully the old ones get flushed out by Google soon.

2 years ago
0 Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

The first stage is a rank0 pytorch script. The downstream stages are rankN scripts, they are waiting for the IP address of the first stage. But the first stage doesn’t return, it simply waits for the rankN scripts to connect to it. But in this case, the rankN scripts doesn’t start. So its probably necessary to have just a single stage.

If i were to start a single rank0, and subsequent rankN tasks, it would be rather messy on ClearML Dashboard. Best to have either a single clearml application...

5 months ago
0 Hi, I'M Getting This Long Error When Running

Alright thanks, i will work on that.

2 years ago
0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

i see. Can i take it that when the client uses
task.execute_remotely(queue_name="1gpu", exit_process=True)then none of the content in its clearml.conf will be used, except for the API part. And Clearml simply uses whatever is on the Agent side.
api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server: web_server: files_server: # Credentials are generated using the webapp, `
# Override with os environment: ...

2 years ago
0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

My assumption is that the agent will have pulled that off the client's clearml.conf.

2 years ago
0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Do you have more info on vault?
Actually it only make sense if the entire department or organisation are saving their models in a common repo. In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future). So each department and even projects will need their own repo.

2 years ago
0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

It would make sense on a very large resource cluster. Unfortunately we only have less than 50 GPUs to share across. A multi-tenant SAAS would cut the resources into even more smaller clusters and not help with efficiency. Or would you have a suggestion?

2 years ago
0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

which clearml.conf is it refering to? I'm executing on my client, which is then remotely executed by the agent. Both of them has ~/clearml.conf.

2 years ago
0 Hi, V1 Of Agent Seems To Have Removed Agent.Package_Manager.Force_Repo_Requirements_Txt. Is This Still Available In Other Forms?

yup. in this case it wasn't root. Removing that USER and -u in pip solves the problem. However, in our production images, we are required to remove root access.
` FROM nvidia/cuda:10.1-cudnn7-devel

ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python

create a non-root user

ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} a...

2 years ago
2 years ago
0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0

Detailed import analysis

**************************

IMPORT PACKAGE boto3

clearml.storage: 0

IMPORT PACKAG...

2 years ago
0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

I can't seem to find the fix to this. Ended up using an image that comes with torch installed.

2 years ago
0 Hi, I Shifted My Clearml Setup To An On-Premise Disconnected Env, Which Has A Pip Repo Setup. I Noted This Warning,

Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.

2 years ago
Show more results compactanswers