Reputation
Badges 1
25 × Eureka!Exactly, just pointing to the fact that, that machine is yours ;)
To summarize: The scheduler should assign tasks the the agent first, which gives a queue the highest priority.
The issue here you assume both are idle and you need global priority based on resource preference. I understand your scenario now, but it will only hold if enqueuing order is "optimal". For example, if machine Y is running a Task B that is about to be completed (e.g. in a minute) then still machine X will pick the new Task B, and again we end up in the scenario where Task A i...
Sure thing 🙂
BTW: ReassuredTiger98 this is definitely an interesting use case, and I think you can actually write some code to solve it if you like.
Basically let's followup on you setup:Machine X: agent listening to queue A, B_machine_a *notice we have two agents here Machine Y: agent listening to queue B_machine_b
Now we (the users) will push our jobs into queues A and B
Now we have a service that does the following:
` see if we have a job in queue B
check if machine Y is working...
but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
How so?
So clearml-init can be skipped, and I provide the users with a template and ask them to append the credentials at the top, is that right?
Correct
What about the "Credential verification" step in clearml-init command, that won't take place in this pipeline right, will that be a problem?
The verification test is basically making sure the credentials were copy pasted correctly.
You can achieve the same by just running the following in your python console:
` from clearml import Ta...
WackyRabbit7 if this is a single script running without git repo, you will actually get the entire code in the uncommitted changes section.
Do you mean get the code from the git repo itself ?
Create a new version of the dataset by choosing what increment in SEMVER standard I would like to add for this version number (major/minor/patch) and uploadOh this is already there
` cur_ds = Dataset.get(dataset_project="project", dataset_name="name")
if version is not given it will auto increase based on semantic versions incrementing the last number 1.2.3 -> 1.2.4
new_ds = Dataset.create(dataset_project="project", dataset_name="name", parents=[cur_ds.id]) `
WackyRabbit7
Cool - so that means the fileserver which comes with the host will stay emtpy? Or is there anything else being stored there?
Debug Images and artifacts will be automatically stored to the file server.
If you want your models to be automagically uploaded add the following :task=Task.init('example', 'experiment', output_uri='
')
(You can obviously point it to any other http/S3/GS/Azure storage)
Yeah you can ignore those, this is some python GC stuff, seems to be related with the OS and python version
Yes exactly like a Task (pipeline is a type of task)
'''
clonedpipeline=Task.clone(pipeline_uid_here)
Task.enqueue(...)
'''
Hi @<1610083503607648256:profile|DiminutiveToad80>
This sounds like the wrong container ? I think we need some more context here
RobustRat47 are you saying updating the nvidia drivers solved the issue ?
SubstantialElk6
The ~<package name with first name dropped> == a.b.c
is a known conda/pip temporary install issue. (Some left over from previous package install)
The easiest way is to find the site-packages folder and delete the package, or create a new virtual environment
BTW:
pip freeze will also list these broken packages
I should manually copy it to the remote services agents?
The code itself needs to run somewhere, currently this has to be your machine, either you manually run the AWS autoscaler or an agents runs it for you. Make sense ?
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
I'm pretty sure there is a nice way, let me check soemthing
i keep getting an failed getting token error
MiniatureCrocodile39 what's the server you are using ?
I was hoping that there's a universal flag somewhere. Asking this because I want all the Models and Artifacts to be stored in one place and the users shouldn't have to edit their configuration files.
You mean like make sure all models/artifacts are always uploaded?
Hi OddShrimp85
I think numpy 1.24.x is broken in a lot of places we have noticed scikit breaks on it, TF and others 😞
I will make sure we fix this one
Or am I forced to do a get, check if the latest version is fainallyzed,
Dataset Must be finalized before using it. The only situation where it is not is because you are still in the "upload" state.
, then increment de version of that version and create my new version ?
I'm assuming there is a data processing pipeline pushing new data?! How do you know you have new data to push?
Task.connect is "automagic" i.e. to server when in Manual mode, from server in agent mode,
set_parameter is one way only and should be used to set an external Task's parameters.
GiddyTurkey39 do you mean to delete them from the server?
I think the only way is using the API, with task.query_tasks and filter, would that have helped?
This line 🙂
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it
Hi @<1610083503607648256:profile|DiminutiveToad80>
You mean the pipeline logic? It should autodetect the imports of the logic function (like any Task.init call)
You can however call Task.force_requirements_env_freeze
and pass a local requiremenst.txt
Make sure to call it before create the Pipeline object
None
This points to the wrong cu117 / driver - could that be?
Sen the full Task log, you can DM it if it is easier
For running the pipeline remotely I want the path to be like /Users/adityachaudhry/.clearml/cache/......
I'm not sure I follow, if you are getting a path with all your folders from get_local_copy , that's exactly what you are looking for, no?