
Reputation
Badges 1
212 × Eureka!` "ipAddressV4": "165.160.15.20",
"organization": {
"asn": "19574",
"asnOrg": "CSC",
"isp": "Corporation Service Company",
"org": "Corporation Service Company"
},
"country": {
"countryName": "United States"
},
"city": {
...
You guys are the maintainers of this repo
I'm not familiar with helm that well to clone this, fix it, and then test
Yes! Thanks so much for the quick turnaround
As they are singular not plural
AWS, I've setup the shared memory between k8 nodes
However, the subprocess calls are somewhat important to our code base thus the problem
so it caches to ~/.clearml/ any files that are under the same project name?
When I click on a task details -> info tab, it seems like each task is setup to run on a single pod/node based on the attributes like gpu memory
, os
, num of cores,
worker
Do you want me to PR that fix?
That is the problem, the if
condition is not evaluating to True
Okay, so basically the DL framework manages the master/worker relationship. I just need to use pod replicas for my k8 agents.
I made the PR here JuicyFox94 AgitatedDove14 https://github.com/allegroai/clearml-helm-charts/pull/106
I was able to get this working by putting Task.init() under __
main__
I just opened a shell with the api and tried to curl my files URL, and the curl just hangs. no response
SuccessfulKoala55 Figured it out, I needed to use 4.2.0
For instance, In my repo, I have a setup.py, how would I run pip install -e .
ahhh its possible my clearml.conf was using the public urls when I made it. Let me try this
Are there any work arounds to this issue? Our team is evaluating this product to potentially buy enterprise license. If we can't fetch data this is a problem.
I think the best change would to respect the value set https://github.com/allegroai/clearml-helm-charts/blob/19a6785a03b780c2d22da1e79bcd69ac9ffcd839/charts/clearml-agent/values.yaml#L50 so you could change it down the road if infra/hosting changes. Also in this case, I'm uploading the data to the public file server URL, but my k8 pod can't reach that for security reasons.
Seems related to this https://github.com/allegroai/clearml/issues/241
Sure. My git repo myProject.git
does not have file.json
checked into VCS. I'd like to add this file at experiment runtime or equivalent.
AgitatedDove14 note the missing brackets https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearml-agent/templates/agentk8sglue-deployment.yaml#L22
when I did Task.init() in train.py
the CLI arguments needed for main.py
don't get captured and the script fails right away. Note this is running --skip-task-init
since train.py has Task.init()