Reputation
Badges 1
212 × Eureka!Can you fix this or should I open a PR. I'm blocked by this.
Yes! Thanks so much for the quick turnaround
note /home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py
is the correct path
I dont know how to do that
"title": "Unusual outbound communication seen from EC2 instance i-<> on server port 80.",
` "ipAddressV4": "165.160.15.20",
"organization": {
"asn": "19574",
"asnOrg": "CSC",
"isp": "Corporation Service Company",
"org": "Corporation Service Company"
},
"country": {
"countryName": "United States"
},
"city": {
...
yea let me unwind some changes so I can pinpoint the issue
However, the subprocess calls are somewhat important to our code base thus the problem
IMO, the dataset shouldnt be tied to the clearml.conf URLs that it was uploaded with, as that URL could change. It should respect the file server URL the agent has.
ahhh its possible my clearml.conf was using the public urls when I made it. Let me try this
it uses the default of epoch
for example, if my github repo is project.git and my structure is project/utils/tool.py
Yea I've done that already but I can do it again
Made some progress getting the gpu nodes to provision, but got this error on my task K8S glue status: Unschedulable (0/4 nodes are available: 1 node(s) had taint {
http://nvidia.com/gpu : true}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity/selector.)
AgitatedDove14 How do I setup a master task to do all the reporting?
When I click on a task details -> info tab, it seems like each task is setup to run on a single pod/node based on the attributes like gpu memory
, os
, num of cores,
worker
AWS, I've setup the shared memory between k8 nodes
Okay, so basically the DL framework manages the master/worker relationship. I just need to use pod replicas for my k8 agents.
Does the clearml module parse the python packages? If I'm using a private pypi artifact server, would I set the PIP_INDEX_URL on the workers so they could retrieve those packages when that experiment is cloned and re-ran?
Yep got it, I was under the impression I could set those values in the UI but I now see they are parsed from my local workstation
I guess I'm confused on venv mode vs docker mode. It seems like I'm passing in my own docker image which is then used at run time?
How does a task specify which docker image it needs?
I made the PR here JuicyFox94 AgitatedDove14 https://github.com/allegroai/clearml-helm-charts/pull/106