Reputation
Badges 1
19 × Eureka!@<1636537836679204864:profile|RipeOstrich93> , can you make sure that the Additional ClearML Configuration for the autoscaler app includes agent.extra_docker_arguments: ["--ipc=host", ]
?
OK - this looks like either the server is down or there's a communication issue (FW most likely)
So the flow, if you want to use clearml-task
, is as follows:
You install ClearML Agent on your worker machine (DGX, in your case). The agent monitors the ClearML Server for a specific queue(s) and wait for tasks to be enqueued there. The Agent should be configured with the correct clearml.conf
file in order to be able to access the server. You use clearml-task
to create new tasks. clearml-task
will create a task as you specify, and will enqueue it to the queue of your ...
@<1658281099807166464:profile|SmallCamel52> which agent version are you using?
@<1523704674534821888:profile|SourLion48> this seems like over the network is blocking the communication to the server - either on your machine, or in the cloud load-balancers, or on the target machine before the requests gets to the server
Usually if you restart your internet provider's hardware (i.e. modem etc.), and assuming you're allocated a new IP each time it connects (doesn't always heppan)
Might be a typo in the new configuration?
This was so we can see the agent log
Hi @<1644147961996775424:profile|HurtStarfish47> , in the ClearML SDK, all downloaded files are cached in the cache folder, the same goes for downloaded dataset files. By default, this cache is mapped to ~/.clearml/cache
and contains the last 100 files downloaded.
Hi ElegantCoyote26 , most certainly 🙂
@<1526734383564722176:profile|BoredBat47> if your S3 server is using https (which I assume it does) the port will be 443
Hi EmbarrassedSpider34 ,
Wasn't the name there when you generated the plot?
Oh, sorry! both user_key and user_secret are with a single underscore :)
I might be wrong about the company_id 🙂 - but in case I'm wrong, you'll see it quickly since you won't be able to login with new users to the UI 🙂
CLEARML__SECURE__HTTP__SESSION_SECRET__APISERVER
No, this is unrelated
are you sure you're using the latest docker images?
If you have an agent with a key and a secret, you basically just need to make sure the agent is monitoring the queue in which you plan to enqueue your tasks
We plan to make this automatic in the next version
TrickySheep9 can I see your code? Or at least the relevant part?
Hi @<1523708340473958400:profile|SweetHippopotamus84> , since the agent is designed to run remotely and not interactively, and MFA is by design interactive, that's not possible
Can you give an example of how you use matplotlib?
Can you please try the latest version ( 1.12.2rc0 )?
Hi @<1541954607595393024:profile|BattyCrocodile47> the main issue is indeed testing, it will probably take us some time to verify this. If you can provide some more testing results in various scenarios it might be good enough to justify a merge 🙂
Hi @<1600661428556009472:profile|HighCoyote66> , what exactly is the flow you're currently using? In ClearML, experiments are created on the server by running local code (allowing the ClearML SDK to capture their properties and dependencies) or alternatively using the clearml-task
CLI to create the task by specifying all properties and dependencies, than these experiments can be cloned and queued for remote execution - at this point, the agent running the task remotely can set up an envi...
I see it's running inside 3.9, so I assume it's correct
Hi @<1610083503607648256:profile|DiminutiveToad80> , please contact support@clear.ml for this 🙂