Thanks @<1547028074090991616:profile|ShaggySwan64> !!
Passing to the backend guys to take a look
That sounds like an issue with "working dir" , check the "Execution" "Working Directory" field.
'.' means the root of the git repository
'subfolder' means run the script from the subfolder etc. also make sure that the script path is adjusted accordingly.
btw: Trains should have filled in all the correct paths... If you have time get the latest trains (0.14.3) and run again see if the problem consts, we should probably fix that bug π
Yea I know, I reported this
LOL, apologies these days it a miracle I still remember my login passwords π
VictoriousPenguin97 I'm assuming the exact same server version ?
Yeah, Curious - is a lot of clearml usecases not geared for notebooks?
That is somewhat correct, notebooks are not actually used with a lot of deep-learning projects as they require entire repository to support.
I guess generally speaking the workflow is, "test your code" (i.e. small scale with limited data), then clone and enqueue for remote execution.
That said, I think it will be great to expand the support.
TrickySheep9 I like the idea of context for Tasks, can you expand on how...
BTW: if you only need the git diff you can just copy them from the UI into a txt file and do:git apply <copied-diff.txt>
VictoriousPenguin97 I'm not sure there is an easy solution, basically you have to edit both MongoDB (artifacts) and Elastic (think debug samples) π
StickyLizard47 apologies for the https://github.com/allegroai/clearml-server/issues/140 not being followed (probably slipped through the cracks of backend guys, I can see the 1.5 release happened in parallel). Let me make sure it is followed.
SarcasticSquirrel56 specifically, did you also spin a clearml-k8s glue? or are the agents statically allocated on the helm chart?
... Would not work for huge llm style models.
yes I agree... but then if the model is small enough then you can just keep it in memory ...
Could you give an example of such configurations ?
(e.g. what would be diff from one to another)
BoredHedgehog47 you need to make sure "<path here>/train.py" also calls Task.init (again no need to worry about calling it twice with different project/name)
The Task.init call will make sure the auto-connect works.
BTW: if you do os.fork , then there is no need for the Task.init, the main difference is that POpen starts a whole new process, and we need to make sure the newly created process is auto-connected as well (i.e. calling Task.init)
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
Hope you donβt mind linking to that repo
LOL π
The issue only arises upon sending Images. (Both numpy, mpl and PIL)
BTW: they should appear under debug-samples
Tab in the results
Are they expanded in the "api_server" ? (I verified on a linux machine, same error, the env in the api_server is not being resolved)
You cannot change the user once you have mount the shared folder with wither CIFS or NFS
SpotlessFish46 unless all the code is under "uncommitted changes" section, what you have is a link to the git repo + commit id
HugeArcticwolf77 oh no, I think you are correct π
Do you want to quickly PR a fix ?
Could you send the logs?
@<1595587997728772096:profile|MuddyRobin9> are you sure it was able to spin the EC2 instance ? which clearml version autoscaler are you running ?
DilapidatedDucks58 I see ...
This might be more complicated that one would imagine, a simple solution might be to store a snapshot of the values every-time we reach a new maximum, a quick hack might be to add it as text on one of the task's parameters or properties (that we can later add to the table as custom column).
wdyt?
command line π
cmd.exe / bash
Hi ReassuredTiger98
I do not want to share with the clearml-agent workstations.
Long story short, no π
The agent is responsible to spin all jobs, regardless of users, basically it has to have a read-only user for all the repositories. I "think" the enterprise version has a vault feature, that allows you to store these kind of secrets on the User itself.
What exactly is the use case?
Youβll just need the user toΒ
name them
Β as part of loading them in the code (in case they are loading multiple datasets/models).
Exactly! (and yes UI visualization is coming π )
Anyway, in the docs, there is a function called task.register_artifact()
Yes, this is rather deprecated... The idea is that it will monitor an obejct and auto sync it (i.e. serialize and upload).
That said, it is just so much easier to do task.upload_artifact
and you can always update/overrwrite if you are passing the same name, that I cannot see the actual use case. Does that make sense? What are you using it for ?
What's the OS / Python version?
FreshParrot56 we could add this capability, but the main caveat is that f your version depends on multiple parent versions you still need to download and extract all the parent versions, which means that when you clear them you might hurt later performance. Does that make sense? What is the use-case / scenario for you?
Is it possible to make a connection to a S3 bucket via this authentication method with the open source version on EKS?
Hi BoredBluewhale23
In your setup, are we talking about agents running inside the Kubernetes cluster, or clients connecting from their own machine ?