Reputation
Badges 1
25 × Eureka!The easiest way would be to rename a queue to "1xgpu 16gb", then make sure only machines with >16gb GPUs listen to it.
Note that an agent can listen to Multiple queues
OK, so if I've got, like, 2x16GB GPUs ...
You could do:clearml-agent daemon --queue "2xGPU_32gb" --gpus 0,1
Which will always use the two gpus for every Task it pulls
Or you could do:clearml-agent daemon --queue "1xGPU_16gb" --gpus 0 clearml-agent daemon --queue "1xGPU_16gb" --gpus 1
Which will have two agents, one per GPU (with 16gb per Task it runs)
Orclearml-agent daemon --queue "2xGPU_32gb" "1xGPU_16gb" --gpus 0,1
Which will first pull Tasks from the "2xGPU_32gb" qu...
What's the trains-server version ?
Make sense. BTW: you can manually add data visualization to a Dataset with dataset.get_logger().report_table(...)
One example is a node that resizes the images, this node receives as input a Dataset and iterates over each image, resizes it an outputs a new Dataset, which is used in the next node downstream in the Pipeline.
I agree, this sounds like a "function" rather than a job, so better suited for Kedro.
organization structureΒ
Β and see for yourself (this pipeline has two nodesΒ
train_model
Β andΒ
predict
Β )
Interesting! let me dive into that and ...
It should be autodetected, and listed in the installed packages with something like:keras-contrib @git+https://www.github.com/keras-team/keras-contrib.git
Is this what you are seeing?
If not you can add it manually with:Task.add_requirements('git+
') Task.init(...)
Notice to call before Task.init
Hi HappyLion37
It seems that you are "reusing" the Tasks. Which means the second time you open them you are essentially resetting the old run and starting all over.
Try to do:task1 = Task.init('examples', 'step one', reuse_last_task_id=False) print('do stuff') task1.close() task2 = Task.init('examples', 'step two', reuse_last_task_id=False) print('do some more stuff') task2.close()
Hi WickedStarfish97
As a result, I donβt want the Agent to parse what imports are being used / install dependencies whatsoever
Nothing to worry about here, even if the agent detects the python packages, they are installed on top of the preexisting packages inside the docker. That said if you want to over ride it, you can also pass packages=[]
Hey WickedGoat98
I found the bug, it is due to the fact the numpy (passed to plotly) contains both datetime and nan, and plotly.js does not like it. I'll make sure this is fixed, in the meantime you can just remove the first row (it contains the nan):df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:]
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes π I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
Do people generally update the same model βentryβ? That feels so wrong to meβ¦how do you reproduce a older model version or do a rollback etc?
Correct, they do not π On the Task itself the output models will reflect the diff filenames you saved, usually ppl just add a running number.
As I'm a Full-stack developer at Core. I'd be looking to extend the TRAINS Frontend and Backend APIs to suit my need of On-Prem data storage integration and lots of other customization for Job Scheduler(CRON)/Dataset Augmentation/Custom Annot. tool etc.
That is awesome! Feel free to post a specific question here, and I'll try to direct to the right place π
Can you guide me to one such tutorial that's teaching how to customize the backend/front end with an example?
You mean l...
The issue itself is changing the default user.
USER appuser
WORKDIR /home/appuser
Any reason for it ?
See here:
https://pip.pypa.io/en/stable/user_guide/#environment-variables
Pass these environment variables as part of the YAML template you are using with the k8s.
Should work for both π
So you want these two on two different graphs ?
HandsomeCrow5 Ideas on improvement are always welcome π
is how you would create different queues,
SarcasticSquirrel56 you can create them from the UI, when the server is already running
(if you are saying, how do I create them in the first installaiton, then yes you are correct, this is possible in the helm chart, I think π )
JitteryCoyote63 The release was delayed due a last minute issue, should be released later today. Anyhow the code is updated on GitHub, so you can start implementing :) let me know if I can be of help :)
Let's try:
` echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && b...
So the way it works when you run a component the return value with the entire function execution is cached, basically:
this did NOT add the artifact to the pipeline via caching on subsequent runs β
you just need to do:
PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url
This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it...
Ohh StraightCoral86 did you check cleaml-task
? This is exactly what it does
(this is the CLI, from code you basically call Task.create & Task.enqueue)
Will this solve it ?
LethalCentipede31 I think seaborn is using matplotlib, it should just work:
https://github.com/allegroai/clearml/blob/6a91374c2dd177b7bdf4c43efca8e6fb0d432648/examples/frameworks/matplotlib/matplotlib_example.py#L48
PanickyMoth78 ScantMoth28
With several models saved by the training process (whose code is not task-aware)
You can actually specify which models to be saved:task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})
https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
This way you can upload only the model you need.
Disable automatic model uploads
Disable the auto uploadtask = Task.init(..., auto_connect_frameworks{'pytorch': False})
I think that the first model saved gets the task name as its name and the following models take
f"{task_name} - {file_name}"
Hmm, I'm not sure what would be a good way to make it consistent, would it make sense to always have the model file name?
I guess it takes some time before the the correct names are assigned?
Hmm that is odd, I have a feeling it has to do with calling Task.close()?!
I just tried with the latest clearml version and it seemed to work as expected
however can you see the inconsistency between the key and the name there:
Yes that was my point on "uniqueness" ... π
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?
What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models.
This is very aligned with the goals of ClearML π
I would to understand more on what is currently missing in ClearML so we can better support this approach
my inexperience in using them a lot until recently. I can see how that is a better solution
I think I failed in explaining my self, I me...