Hi @<1533619716533260288:profile|SmallPigeon24> , I think the worker id is dependent on the worker name, so you can control it this way
@<1533619725983027200:profile|BattyHedgehong22> , it appears from the log that it is failing to clone the repository. You need to provide credentials in clearml.conf
Hi @<1580367711848894464:profile|ApprehensiveRaven81> , for a frontend application you basically need to build something that will have access to the serving solution.
And you use the agent to set up the environment for the experiment to run?
So you can download/view files on the cloud
There aren't any specific functions for this. But all of this information sits on the task object. I suggest running dir(task) to see where this attribute is stored
Hi @<1739455977599537152:profile|PoisedSnake58> , in the log you have the location of the cloned repo printed out.
For CLEARML_AGENT_EXTRA_PYTHON_PATH you need to provide it with a path
And from the error you get, like I mentioned it looks there is no services queue. I would check the logs on the agent-services container to see if you get any errors as this is the agent in charge of listening to the 'services' queue
Hi @<1673501397007470592:profile|RelievedDuck3> , there is some discussion of it in this video None
@<1556812486840160256:profile|SuccessfulRaven86> , you can specify different containers in clearml.conf
Also please note that your path is wrong
Is it possible that it's creating separate datasets? Can you post logs of both processes?
From my understanding ClearML uses Apache-2.0 license, so it depends if that covers it or not
WackyRabbit7 , in that case you can simply register the pretrained model in the system using the SDK 🙂
In the top left you have 'console' see if you get any errors
Hi OddShrimp85 , you sure can! You can use the API. This one is useful for getting data about specific tasks. I think you'll have to sift through the response to find what you need 🙂
Hi @<1637624992084529152:profile|GlamorousChimpanzee22> , did you test this section and it doesn't work or you just didn't find in the code where it's being read?
I see. Just to simplify the issue - When using pytorch - were you getting machine usage reports (CPU/GPU usage)?
Also, I'm guessing various scalars aren't being reported. I'm guessing those were previously captured automatically by Clearml?
Hi @<1570583237065969664:profile|AdorableCrocodile14> , how did you upload the image?
The communication is done via HTTPS so relevant ports should be open.
Did you try with a hotspot connection from your phone?
ReassuredTiger98 , I played with it myself a little bit - It looks like this happens for me when an experiment is running and reporting images and changing metric does the trick - i.e reproduces it. Maybe open a github issue to follow this 🙂 ?
Hi VivaciousBadger56 , This is a good question. In my opinion it's best to start with viewing videos from ClearML's YouTube video. This one is especially useful:
https://www.youtube.com/watch?v=quSGXvuK1IM
As regards to which steps to take, I think the following should cover most bases:
Experiment tracking & management - See that you can see all of the expected outputs in the ClearML webUI Remote experiment execution - Try and execute an experiment remotely using the agent. Change some c...
How would the ec2 instance get the custom package code to it?
Hi @<1695969549783928832:profile|ObedientTurkey46> , is this happening when running on top of the agent or locally?
With the autoscaler it's also easier to configure a large variety of different compute resources. Although if you're only interested in p4 equivalent instances and on fast demand I can understand the issue
On prem is also K8s? Question is if you run the code unrelated to ClearML on EKS, do you still get the same issue?