Hi ScantChimpanzee51
How are you launching the code ?
Basically the easiest way is to do so with the example you just mentioned,
Can this issue be reproduced ?
It should actually work the same, if you find out it fails to properly register let me know (and then I guess a github issue is the next step)
Hmm, I think you should use --template-yaml
Is this some sort of polling ?
yes
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideas (edited)
No need to worry, it pulls every 30 sec, and this is negligible (as a comparison any task will at least send a write request every 30 sec, if not more)
Actually webhooks might be more taxing on the server, as you need to always have a webhook up (i.e. wasting a socket ...)
I think CostlyOstrich36 managed to reproduce?!
This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also helpnot to pollute clearml spaces with half backed ideas
What's the value of runnign outside of an experiment management context ? don't you want to log it?
There is no real penalty here, no?!
. I was wondering what is the use of
PipelineController.create_draft
if you can't use it to clone and run tasks, as we have seen
I think the initial thought was to allow to create a pipeline from a pipeline programatically. Then once you have the "pipeline" you can manually enqueue it and modify it. Think a pipeline constructing other pipelines in flight based on some logic, then launching them in parallel.
make sense ?
Hi ConvolutedChicken69
assuming you are runnign the agent in venv mode you can do something like:$ CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent daemon --queue default
This will basically only clone the code and use the default python the clearml-agent itself is using.
Does that help?
BTW:
it gets an error as it can't find it with pip.
What's the error? how come the package cannot be installed ?
So inside the pipeline logic you can do Task.current_task().id
Or inside a component Task.current_task().parent
GrievingTurkey78
maybe since the package is not directly imported in my code it is possible to get a different version to what I have locally (?).
If these are derivative packages (i.e. imported by other packages) they are not automatically logged when executing the Task manually (in order to keep the "installed packages as lean as possible on the one hand but specify also specify the important packages for you)
That said, when the "trains-agent" executed the task it will store nack...
DeliciousBluewhale87
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there..
Hmm that means it is working...
Do you see there a *.conf files? What do they contain? (it point to the correct clearml-server config)
And are you sure your are pointing to the correct API server and not mixing API with WEB address ?
Also what's the clearml-server version?
Hi @<1729309120315527168:profile|ShallowLion60>
How did you create those credentials ?
Hi @<1570220858075516928:profile|SlipperySheep79>
Is there a way to specify the working dir from the decoratoe
not directly, but why would that change anything? I mean the coponent code will be created in the git root, and you can still access files inside the subfolders
from .subfolder import something
what am I missing?
This would work to load the local modules, but I’m also using poetry and the
pyproject.toml
is in the subdirectory, so the agent won’t install any dependency if I don’t set the
work_dir
hmmm true, in terms of requirements, you can list them in the decorator (see packages
argument)
Hi PanickyMoth78
it was uploading fine for most of the day but now it is not uploading metrics and at the end
Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_events
This seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens
For example, could you test if this one works:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
So like a UI for creating pipelines doing different things on the different solutions ?
ShallowGoldfish8 I believe it was solved in 1.9.0, can you verify?pip install clearml==1.9.0
Thanks ScantChimpanzee51 !
Let me see what I can find, should be easy enough to fix now 🙂
Do you have any experience and things to watch out for?
Yes, for testing start with cheap node instances 🙂
If I remember correctly everything is preconfigured to support GPU instances (aka nvidia runtime).
You can take one of the templates from here as a starting point:
https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/
I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?
If you are using the "default" queue for the agent, notice you might need to run the agent with --services-mode
to allow for multiple pipeline components on the same machine
The package detection is done when running the code on your laptop, and this is when it first logs the packages and versions. Following it, what do you have on your laptop? OS/Conda/Python
I have an idea, can you try with:task = Task.init(..., reuse_last_task_id=False)
I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.
So now for it to take place you need to enqueue the Task and set an agent to pick it up and run it.
When the agent is running the Task the new parameter will be passed.
does that make sense ?
Hi BitterStarfish58
Where are you uploading it to?
WackyRabbit7 How do I reproduce it ?
, but what I really want to achieve is to share this code:
You mean to share the code between them, unless this is a "preinstalled" package in the container, each endpoint has it's own separate set of modules / files
(this is on purpose, so you could actually change them, just image diff versions of the same common.py file)