ResponsiveCamel97

8 Questions, 36 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

36 × Eureka!

Questions 8
Answers 36

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey, Can Anyone Please Explain To Me How The /Tmp/.Clearml_Agent.Something.Cfg File Is Generated Which Next Is Used In Docker? Because This File Is Slightly Different From Mine For Example In Mine /Home/Asa/Clearml.Conf I Set System_Site_Packages = False

Hey, Can anyone please explain to me how the /tmp/.clearml_agent.something.cfg file is generated which next is used in docker? because this file is slightly ...

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Greetings, Could You Please Clarify If It Is Possible To Reinstall All Packages Every Time? For Example, I Tried To Start The Agent With Docker Options And Got The Following Message:

Greetings, Could you please clarify if it is possible to reinstall all packages every time? For example, I tried to start the agent with docker options and g...

mlops

4 years ago

0 Votes

3 Answers

3K Views

0 Votes 3 Answers 3K Views

Hey Everyone, We Have Such The Following Problem. Our Developers Asked Direct Access To Worker Nodes So That They Can Run Interactive Sessions (Clearml-Session). But The Security Team Does Not Approve, As We Have Requested Access To Ports 0-65535. Here T

Hey everyone, We have such the following problem. Our developers asked direct access to worker nodes so that they can run interactive sessions (clearml-sessi...

remote-ssh

3 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hey, Could You Help Me? I’Ve Tried Update Clearml-Server In K8S Old And New Clearml In The Different Namespaces, But After Migrate I Got The Error Error 101 : Inconsistent Data Encountered In Document: Document=Output, Field=Model How It Fix?

Hey, could you help me? I’ve tried update clearml-server in k8s Old and new clearml in the different namespaces, but after migrate I got the error Error 101 ...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey. I Tried Setting Clearml-Server-Chart Using Helm But My Cluster Had 426 Error. All Routing Is Configured With Istio/Ingress Which Doesn’T Work With Http1.0. Will Work With Http 1.1 Be Added To Clearml?

Hey. I tried setting clearml-server-chart using helm but my cluster had 426 error. All routing is configured with istio/ingress which doesn’t work with http1...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey, One More Questions ) After Migrate Data To New Clearml 1.0.2 (From 0.17.0-63, But Image Version Of Elastic(7.6.2), Mongo(3.6.5), Redis(5.0) The Same), Several Experiments Doesn’T Open And I Get The Error:

Hey, one more questions ) after migrate data to new clearml 1.0.2 (from 0.17.0-63, but image version of elastic(7.6.2), mongo(3.6.5), redis(5.0) the same), s...

clearml

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Greetings, I Have A Question About Provide Arguments To Docker, By Clearml-Agent Could I Provide An Argument For Docker Not In Clearml.Conf, But In The Start Daemon? For Example Clearml-Agent --Config-File ~/Clearml.Conf Daemon --Docker Agent-Image-Test

Greetings, I have a question about provide arguments to docker, by clearml-agent Could I provide an argument for docker not in clearml.conf, but in the start...

clearml

4 years ago

0 Votes

29 Answers

2K Views

0 Votes 29 Answers 2K Views

Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

Greetings! could you help me? I’ve just tried delete old experiment (year ago) but got the following error: apiserver [2022-06-17 13:36:59,636] [10] [WARNING...

clearml

3 years ago

0 Hey, Could You Help Me? I’Ve Tried Update Clearml-Server In K8S Old And New Clearml In The Different Namespaces, But After Migrate I Got The Error Error 101 : Inconsistent Data Encountered In Document: Document=Output, Field=Model How It Fix?

` [2021-06-11 15:24:36,885] [9] [ERROR] [clearml.service_repo] Returned 500 for queues.get_next_task in 60007ms, msg=General data error: err=('1 document(s) failed to index.', [{'index': {'_index': 'queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-06', '_type': '_doc', '_id': 'PkGr-3kBBPcUBw4n5Acx', 'status': 503, 'error': {'type':..., extra_info=[queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-06][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[queue_metrics...

4 years ago

see error in apiserver

4 years ago

0 Greetings, Could You Please Clarify If It Is Possible To Reinstall All Packages Every Time? For Example, I Tried To Start The Agent With Docker Options And Got The Following Message:

thanks

4 years ago

0 Hey, Can Anyone Please Explain To Me How The /Tmp/.Clearml_Agent.Something.Cfg File Is Generated Which Next Is Used In Docker? Because This File Is Slightly Different From Mine For Example In Mine /Home/Asa/Clearml.Conf I Set System_Site_Packages = False

But if I don’t want that new venv to inherit everything? I prepared my own image and want use this venv

4 years ago

old 0.17
new 1.0.2
partly used helm charts, we are used yaml files from helm, but we rewrote part about pvc and our clearml locate in several nodes

4 years ago

And one more questions
Could I provide an argument for docker not in clearml.conf, but in the start daemon?
for example
clearml-agent --config-file ~/clearml.conf daemon --docker agent-image-test “-v /home/trains/clearml-agent-data/3/.cache:/root/.cache” --queue test --create-queue --foreground --gpus=3
Or I can do it only in clearml.conf?

4 years ago

In our case some packages are taken from /usr/lib/python3/dist-packages, others from the local environment and this causes a conflict when importing the attr module

4 years ago

0 Greetings, I Have A Question About Provide Arguments To Docker, By Clearml-Agent Could I Provide An Argument For Docker Not In Clearml.Conf, But In The Start Daemon? For Example Clearml-Agent --Config-File ~/Clearml.Conf Daemon --Docker Agent-Image-Test

I think per task we use clearml-task? but yes, this needs permanently, like config clearml.conf we have 4 gpu, and for each, we have a separate cache
I don’t want to make 4 cleaml.conf files

4 years ago

Can you share the modified help/yaml ?

Yep, here in attachment, clearml and pvc

Did you run any specific migration script after the upgrade ?

nope, I’ve copied data from fileservers and elasticsearch plus made mongodump

How many apiserver instances do you have ?

1 apiserver container

How did you configure the elastic container? is it booting?

Standard configuration (clearml.yaml). Elastic works

4 years ago

0 Hey Everyone, We Have Such The Following Problem. Our Developers Asked Direct Access To Worker Nodes So That They Can Run Interactive Sessions (Clearml-Session). But The Security Team Does Not Approve, As We Have Requested Access To Ports 0-65535. Here T

Clearml in kubernetes
worker nodes are bare metal and they are not in k8s yet :(

3 years ago

0 Greetings, Could You Please Clarify If It Is Possible To Reinstall All Packages Every Time? For Example, I Tried To Start The Agent With Docker Options And Got The Following Message:

Nothing)
I’ll talk to the developers and I think I figured out how to solve this problem

4 years ago

from which component?

4 years ago

0 Hey. I Tried Setting Clearml-Server-Chart Using Helm But My Cluster Had 426 Error. All Routing Is Configured With Istio/Ingress Which Doesn’T Work With Http1.0. Will Work With Http 1.1 Be Added To Clearml?

When I load http://app.clearml.my.domain.com I get Status Code: 426 at http://app.clearml.my.domain.com/v2.13/login.supported_modes (for example)
At the moment I’ve downloaded helmchart and added support proxy_http_version 1.1; in nginx. Then everything works

4 years ago

0 Greetings, Could You Please Clarify If It Is Possible To Reinstall All Packages Every Time? For Example, I Tried To Start The Agent With Docker Options And Got The Following Message:

Thank you, I understand, but the developers want all packages to be in one place

4 years ago

webserver 127.0.0.1 - - [11/Jun/2021:14:32:02 +0000] “GET /version.json HTTP/1.1” 304 0 “*/projects/cbe22f65c9b74898b5496c48fffda75b/experiments/3fc89b411cf14240bf1017f17c58916b/execution?columns=selected&columns=type&columns=name&columns=tags&columns=status&columns=project.name&columns=users&columns=started&columns=last_update&columns=last_iteration&columns=parent.name&order=last_update” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)

for example webserver

4 years ago

Yes, it’s the same. I realized my failure and now everything works) many thanks

4 years ago

AgitatedDove14 I can try but are you sure this will help?

4 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

I just hided elastic IP in the second output

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

what interesting, that a new experiments clearml can delete without any problems
but old archived experiments, clearml didn’t want remove

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

` - env:
- name: bootstrap.memory_lock
value: "true"
- name: cluster.name
value: clearml
- name: cluster.routing.allocation.node_initial_primaries_recoveries
value: "500"
- name: cluster.routing.allocation.disk.watermark.low
value: 500mb
- name: cluster.routing.allocation.disk.watermark.high
value: 500mb
- name: cluster.routing.allocation.disk.watermark.flood_stage
value: 500mb
...

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

and I still see this error in the logs
[2022-06-20 13:24:27,777] [9] [WARNING] [elasticsearch] POST ` [status:N/A request
:60.060s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/...

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

And developers complain to me that they can’t start experiment
` APIError: code 500/100: General data error (ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-service', port='9200'): Read timed out. (read timeout=60)))
Failed deleting old session ffaa2192fb9045359e7c9827ff5e1e55
APIError: code 500/100: General data error (ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-service', port='9200'): Read timed out. (read timeo...

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

Delete, reset

looks like something with index
` index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 0 2.4h existing_store done n/a n/a 10.18.13.96 cle...

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

Developers complain that the experiments are long hung in the status of Pending
more than 10 minutes

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

I recovered the ES data from the backup
It helped.

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

Recently, the free space on pv ended and the cluster switched to read_only_allow_delete. I’ve tried remove old experiments, but it didn’t help and I got the same error.

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

ok, lets try
but it’s a lot of resources

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

Anyway, if there was any additional information for troubleshooting or backups on the site would be very cool.

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

I’ve tried with these two
` >>> client.tasks.get_all(system_tags=["archived"])
+----------------------------------+------------------------------------------------------------+
| id | name |
+----------------------------------+------------------------------------------------------------+
| 378c8e80c3dd4ff8901f04f00824acbd | ab-ai-767-easy |
| c575db3f302441c6a977f52c...

3 years ago

0 Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

at the moment ES has the following resources
Limits: cpu: 2 memory: 10G Requests: cpu: 2 memory: 10GWe launched ES with these parameters at the time of the problems

3 years ago

Show more results