Reputation
Badges 1
25 × Eureka!ElegantKangaroo44 I think TrainsCheckpoint would probably be the easiest solution. I mean it will not be a must, but another option to deepen the integration, and allow us more flexibility.
I think RC should be out in a day or two, meanwhile pip install git+ https://github.com/allegroai/clearml.git
but it still not is able to run any task after I abort and rerun another task
When you "run" a task you are pushing it to a queue, so how come a queue is empty? what happens after you push your newly cloned task to the queue ?
Yes, because when a container is executed, the agent creates a new venv and inherits from the system wide installed packages, but it cannot inherit or "understand" there is an existing venv, and where it is.
@<1597762318140182528:profile|EnchantingPenguin77> can you provide the full log?
CurvedHedgehog15 is it plots or scalars you are after ?
Thread is discussed here: None
Can my request be made as new feature so that we can tag same type of graphs under one main tag
Sure, open a Git Issue :)
if I run my own ClearML self-hosted server?
Then you have everything on your end, it will not communicate with the saas offering. meaning no limits what so ever.
(That said some of the cloud auto-scaling and compute features are not part of the open source)
Hi @<1533620191232004096:profile|NuttyLobster9>
I, but no system stats. ,,,
If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
Hi @<1597762318140182528:profile|EnchantingPenguin77>
, but it seems like clearml always create a virtual environmen
Yes that's correct, but the new venv inside the container inherits from the system packages (so if nothing changes it does nothing)
Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts
Yoo can but the "correct" way to work with python and co...
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
HugeArcticwolf77 changing the color is definitely a feature we will have in the next version, right now I think you cannot π it is randomly chosen based on the title/series and I think your example is a great failure case of that randomness π
I can't seem to figure out what the names should be from the pytorch example - where did INPUT__0 come from
This is actually the latyer name in the model:
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/examples/pytorch/train_pytorch_mnist.py#L24
Which is just the default name Pytorch gives the layer
https://discuss.pytorch.org/t/how-to-get-layer-names-in-a-network/134238
it appears I need to converted into TorchScript?
Yes, this ...
I see, that means xarray is not an actual package but a folder add to the python path.
This explains why Task.add_requirements fails, as it is supposed to add python packages to the equivalent of "requirements.txt" ...
Is the folder part of the git repository ? How would you pass it to the remote machine the cleamrl-agent is running on?
Could it be that clone has to be False? (I assume the reasoning is the cloning feature)
WickedGoat98 did you setup a machine with trains-agent pulling from the "default" queue ?
Nope - confirmed to be running on the OS's Python environment,
okay so bare metal root is definitely not recommended.
I'm not sure how/why it get's stuck though π
Any chance you can run the agent as non-root?
Also maybe preferred in docker mode, so it is easier for you to control the environment of the Task
Hi QuaintJellyfish58
This is odd, this "undefined" project is also marked as "Example" which would explain why you cannot delete it, but not how you ended up with one
Any idea on what changed on your server ?
Hi @<1798887585121046528:profile|WobblyFrog79>
. When I execute the pipeline remotely in Kubernetes, those components
two things, one, make sure you specify the repo you need the components from in the decorator function, what will happen is the repo will be cloned into the container running on k8s, then inside the repo root your script (i.e. pipeline step) will be running.
[None](https://github.com/clearml/clearml/blob/9c93aa9e538075c848647dcd88e3e12bec051b5f/clearml/automation/con...
yes i can communicate with the server, i managed to put tasks in the queue and retrieve them as well as running tasks with metrics reporting
Through the UI or python code ?
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) π
Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
I do expect it toΒ
pip
Β install though which doesnβt root access I think
Correct, it is installed on a venv (exactly for that).
It will not fail if the apt-get fails (only warnings)
Let me know if it worked
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3Then run againclearml-task ...