Hmm two questions: 1. How come it did not detect the packages when you were running the original task manually? 2. Could it be the poetry manager option is not working correctly?! Can you verify the venv is created with all packages? If so can you post the full log?
Hi UnevenOstrich23
--cpu-mode means no GPU's are passed to the Tasks it executed.
--services-mode means that instead of the agent running a single job at a time, it will spin as many jobs as you need on the same machine
PunyBee36 to get https add an aws elb before the server , the elb will add the https to any outside connection
The point is, " leap"
is proeperly installed, this is the main issue. And although installed it is missing the ".so" ? what am I missing? what are you doing manually that does Not show in the log?
In other words how did you install it "menually" inside the docker when you mentioned it worked for you when running without the agent ?
Hi LonelyMoth90 , where exactly are you getting the error ? Is it trains-agent running your experiment ?
Okay this is indeed reported in the UI, but the trains-agent
is running the experiment, and seems to be failing to clone the repository in question.
Seems like a "https" error, git is actually failing to clone the repository error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.
Can you manually run the clone command on that machine ? I would guess there is some kind of firewall sitting in the middle of the https connection, and that is causing the git to ...
Simple git clone on that repo works well
On the machine running the trains-agent ?
Hi SmarmySeaurchin8
, I was wondering if I could change the commit id to the current one as well.
Actually that would be possible, but will need a bit of code to support controlling Task properties (not just configuration parameters)
How can I do that without running this Task by it's own?
Assuming you have a committed code that already supports it. You can clone the executed Task, and then change the commit ID to the "latest on branch" (see drop down when editing)
Would t...
Could you right click on the failed experiment , select reset and send it again for execution?
Could that error be a random network issue ?
(Basically this seems like a generic network error not actually related to the trains-agent)
Is the trains-agent
running in docker mode or venv mode?
Hi BoredSquirrel45
as of today, my required packages aren't being recognized in cloned
Are you saying you are editing the code directly in the cloned Task, then enqueue the Task an the agent does not "auto recognize" the package ?
Hmm, you are missing the entry point in the execution (script path).
Also as I mentioned you can either have a git repo or script in the uncommitted changes, but not both (if you have a git repo then the uncommitted changes are the git diff)
Hmm, is there a way to do this via code?
Yes, clone the Task Task.clone
Then do data=task.export_task()
and edit the data object (see execution section)
Then update back with task.update_task(data)
BTW: the cloning error is actually the wrong branch, if you take a look at your initial screenshot, you can see the line before last branch='default'
which I assume should be branch='master'
(The error itself is still weird, but I assume that this is what git is returning)
I've seen that the file location of a task is saved
What do you mean by that? is it the execution section "entry point" ?
Has anyone done this exact use case - updates to datasets triggering pipelines?
Hi TrickySheep9 seems like this is following a diff thread, am I missing something ?
That is exactly that, the trains-agent is replicating the code from the git repo, and trying to apply the git diff (see uncommitted changes section). Obviously it failed 🙂
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...
Can you do it manually, i.e. checkout the same commit id, then take the uncommitted changes (you can copy paste it to diff.txt) then call git apply diff.txt ?
Which would also mean that the system knows which datasets are used in which pipelines etc
Like input
artifacts per Task ?
Maybe this is part of the paid version, but would be cool if each user (in the web UI) could define their own secrets,
Very cool (and actually how it works), but at the end someone needs to pay for salaries 😉
The S3 bucket credentials are defined on the agent, as the bucket is also running locally on the same machine - but I would love for the code to download and apply the file automatically!
I have an idea here, why not use the "docker bash script" argument for that ?...
Yes, the container level (when these docker shell scripts run).
I think this is the tricky part, in code you can access the user ID of the Task, and download the .env and apply it, but before the process starts I can't really think of a way to do that ...
That said, I think that in the paid version they have "vault" support, which allows you to store the .env file on the clearml-server, and then the agent automatically applies it at the beginning of the container execution.
Interesting, do you think you could PR a "fixed" version ?
https://github.com/allegroai/clearml-web/blob/2b6aa6043c3f36e3349c6fe7235b77a3fddd[…]app/webapp-common/shared/single-graph/single-graph.component.ts
Hi @<1578555761724755968:profile|GrievingKoala83>
Is it possible to overrided the parameters through the configuration file when restarting the pipeline from ui?
The parameters of the Pipeline are overridden from the UI, not the pipeline components,
you can to use the pipeline parameters as is as the pipeline components parameters
Is your pipeline built from Tasks, or decorators over functions ?
Hi UnevenDolphin73
I think there is an open issue on github, I'm not sure but I think there is already some internal progress
Hmmm, what's your trains version ?