Reputation
Badges 1
25 × Eureka!Hi BattyLion34
The windows issue seems like it is coming from missing QT installed on the Host machine
Check the pyqt5
version in your "Installed packages"
see here:
https://superuser.com/questions/1433913/qtpy-pythonqterror-no-qt-bindings-could-be-found
Regrading the linux, it seems your are missing the object_detection
package, where do you usually install it from ?
while I'm looking to upload local weights
Oh, so this is not "importing uploaded (exiting) model" but manually creating a Model.
The easiest way to do that is actually to create a Task for Model uploading, because the model itself will be uploaded to unique destination path, and this is built on top of the Task.
Does that make sense ?
Sadly, I think we need to add another option like task_init_kwargs
to the component decorator.
what do you think would make sense ?
Hi UnevenDolphin73
Maybe. When the container spins, are there any identifiers regarding the task etc available?
You mean at the container level or at clearml?
I create a folder on the bucket perΒ
python train.py
Β so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
Nice π I have an idea, how about per user ID? then they can access their "secrets" based on the owner of the Task ?task.data.user
SuperiorDucks36 you mean to manually set an experiment (and the dummy Task is just a way to have an entry to configure), do I understand you correctly ?
Following on that, we are thinking of doing it all for you with a CLI , that will basically create a task from a code/repo you already have on your machine. What do you think?
Would you have an example of this in your code blogs to demonstrate this utilisation?
Yes! I definitely think this is important, and hopefully we will see something there π (or at least in the docs)
Hi FunnyTurkey96
Let me check what's the status here
(BTW: Is this for a specific Task or for a specific Project?)
Hi @<1547028031053238272:profile|MassiveGoldfish6>
The issue I am running into is that this command does not give me the dataset version number that shows up in the UI.
Oh no, I think you are correct, it will not return the version per dataset π (I will make sure we add it)
But with the dataset ID you can grab all the properties:Dataset.get(dataset_id="aabbcc").version
wdyt
IntriguedRat44 could I ask you to open a GitHub issue on it?
I really do not want it to slip through our fingers...
(BTW: meanwhile I was not able to reproduce it, what's the OS / nvidia drivers you are using )?
I think this is the temp requirements it creates not your requirements file. If you attach a log here with the "installed packages" section maybe we could help to debug it
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
This is the prerequisites of the docker service installed on the host machine (where the agent is running)
Basically follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://docs.docker.com/compose/gpu-support/
(So it re-reads the configuration file)
π anything that can be done?
Basically what I want is aΒ
clearml-session
Β but with a docker container running JupyterHub instead of JupyterLab.
I missed that π
The idea of clearml-session
is to launch a container with jupyterlab (or vscode) on a remote machine, and connect the users machines (i.e. the machine executed the clearml-session
CLI) directly into the container.
Pleacing the jupyterlab with JupyterHub will be meaningless here, becuase the idea it spins an instance (contai...
Wait, with the Port it does not work?
Notice that since this is external S3 you have to have the port specified so it Knows this is not an AWS S3 but a different compatible service
I managed to set up my (Windows) laptop as a worker and reproduce the issue.
Any insight on how we can reproduce the issue?
BTW: CloudyHamster42 I think this issue was discussed on GitHub, and the final "verdict" was we should have an option to split/combine graphs on the UI side (i.e. similar to the "smoothing" or wall-time axis etc.)
I wonder if I just need to join 2 docker-compose files to run everything in one session
Actually that could also work
But for reference, when I said IP i meant the actual host network IP not the 127.0.0.1 (which is the same as localhost)
We actually plan to create different queues for different types of workloads, we are a bit seeing what the actual usage is to define what type of workloads make sense for us.
That sounds like a great path to take, it will make it very clear fro users on what they will be getting and why they should use specific queue.
As for the memory, yes the reasoning is clear, the main thing we'll have to see is hot define the limits, because we have nodes with quite different resources availab...
Sorry I missed the additional "." in the _update_requirements
Let me check ....
Hi @<1578193384537853952:profile|MoodyOx45>
I have a task A that creates another task B via subprocess.
So the thing about the agent, when it runs the code, there is only One task to rule them all. basically any fork/spawn of subprocess will automatically be logged as the parent Task
I think that what you want is to build a pipeline from those Tasks? Or create a Task and enqueue it manually directly from Task A?
(btw: you can forcefully cause the subprocess to create it's own Task b...
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isnβt there an easier way to achieve that?
You can edit the extra_vm_bash_script
which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.
any idea why i cannot selected text inside the table?
Ichh, seems again like plotly π I have to admit quite annoying to me as well ... I would vote here: None
Yes, the agent's mode is global, i.e. all tasks are either inside docker or in venv. In theory you can have two agents on the same machine one venv one docker listening to two diff queues
I'm hoping we are ready to release
we will try to use Triton, but itβs a bit hard with transformer model.
Yes ...
All extra packages we add in serving)
So it should work, you can also run your preprocess class manually from your own machine (for debugging), if you pass to it a local file (basically the downloaded model file from the UI, it should work
it. But itβs maybe not the best solution
Yes... it is not, separating the pre/post to CPU instance and letting triton do the GPU serving is a lot more effici...