Reputation
Badges 1
25 × Eureka!Hi BroadMole64
'from X import Y', which says that there isn't such module X. any help? thanks.
can you see package X under the "Execution" tab "Installed Packages" section ?
(think of this section as requirements.txt section, in order for the agent to install the package on the remote machine it should have it listed there)
Ssh is used to access the actual container, all other communication is tunneled on top of it. What exactly is the reason to bind to 0.0.0.0 ? Maybe it could be a flag that you, but I'm not sure in what's the scenario and what are we solving, thoughts?
Good question 🙂from clearml import Task Task.init('examples', 'test')
Hi ZippySheep23
Any ideas what might be happening?
I think you passed the upload limit (2.36 GB) 🙂
Hi SubstantialElk6
Generically, we would 'export' the preprocessing steps, setup an inference server, and then pipe data through the above to get results. How should we achieve this with ClearML?
We are working on integrating the OpenVino serving and Nvidia Triton serving engiones, into ClearML (they will be both available soon)
Automated retraining
In cases of data drift, retraining of models would be necessary. Generically, we pass newly labelled data to fine...
Hi SparklingHedgehong28
What would be the use for "end of docker hook" ? is this like an abort callback? completion ?
instance protection
Do you mean like when instance just died (line spot in AWS) ?
Hi @<1726047624538099712:profile|WorriedSwan6>
On a different issue, have you any solution on how to make the agent listen to multiply queues?
each agent is connected with one type of queue that represents the Job that agent will create. You can connect to it multiple queues, and it will pull from creating the same "type" of job regardless of where it's coming from. If you want another job to be created, just spin another agent, there is no limit to the number of agents you can spin ...
for example, if I somehow start the execution of an agent task in a specific docker container?)
You mean to specify the container from code? or to make sure the agent can access private docker container registry ? Or is it for private pypi container repository ?
Quick update Nexus supports direct http upload, which means that as CostlyOstrich36 mentioned, just pointing to the Nexus http upload endpoint would work:output_uri="http://<nexus>:<port>/repository/something/"See docs:
https://support.sonatype.com/hc/en-us/articles/115006744008-How-can-I-programmatically-upload-files-into-Nexus-3-
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
🎉
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar
and you have your credentials in the browser when deleting a Task ?
Then check in the clearml.conf under files_server
And use what you have there (for example http://localhost:8081 )
Sure, run:clearml-agent initIt is a CLI wizard to configure the initial configuration file.
SmilingFrog76
there is no internal scheduler in Trains
So obviously there is a scheduler built into Trains, this is the queues (order / priority)
What is missing from it is multi node connection, e.g. I need two agents running the exact same job working together.
(as opposed to, I have two jobs, execute them separately when a resource is available)
Actually my suggestion was to add a SLURM integration, like we did with k8s (I'm not suggesting Kubernetes as a solution for you, the op...
so far I understand, clearml tracks each library called from scripts and saves the list of this libraries somewhere (as I assume, this list is saved as requirements.txt file somewhere - which is later loaded into venv, when pipeline is running).
Correct
Can I edit this file (just to comment the row with "object-detection==0.1)?
BTW, regarding the object-detection library. My training scripts have calls like:
Yes in the UI, iu can right click on the Task select "reset", then it...
UnevenDolphin73 it seems this is a UI browser limit, this means we will need to move it into the server ...
See here: https://clearml.slack.com/archives/CTK20V944/p1640247879153700?thread_ts=1640135359.125200&cid=CTK20V944
Hm GiganticTurtle0 let me check quickly it
I like the idea of using the timeit interface, and I think we could actually hack it to do most of the heavy lifting for us 🙂
but instead, they cannot be run if the files they produce, were not committed.
The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?
Hi SarcasticSparrow10
I think the default search is any partial match, let me check if there is a way to do some regexp / wildcard
Or is this a feature of hyperdatasets and i just mixed them up.
Ohh yes, this is it. Hyper Datasets are part of the UI (i.e. there is a Tab with the HyperDataset query) Dataset Usage is currently listed on the Task. make sense ?
Sorry that was a reply to:
Otherwise I could simply create these tasks with Task.init,
I am trying to see if the user can submit a list of resource requirements (e.g 4GPUs, 12 cores, 100GB diskspace) for the task when queuing the task and the agents pick these tasks if they have the requested resources. With this, the user need not think about which queue to send the task to. The users just state what they need and the agents do the scheduling for them.
Can I assume we are talking Kubernetes under the hood for the resource allocation ?
Also, how would one ensure immutability ?
I guess this is the big question, assuming we "know" a file was changed, this will invalidate all versions using it, this is exactly why the current implementation stores an immutable copy. Or are you suggesting a smarter "sync" function ?
The idea of queues is not to let the users have too much freedom on the one hand and on the other allow for maximum flexibility & control.
The granularity offered by K8s (and as you specified) is sometimes way too detailed for a user, for example I know I want 4 GPUs but 100GB disk-space, no idea, just give me 3 levels to choose from (if any, actually I would prefer a default that is large enough, since this is by definition for temp cache only), and the same argument for number of CPUs..
Ch...