Reputation
Badges 1
25 × Eureka!Hi SuperiorDucks36
Could you post the entire log?
(could not resolve host seems to be coming from the "git clone" call).
Are you able to manually clone the repository on the machine running trains-agent
Okay, this seems to be the problem
I "think" I have a clue on the issue that is lost here in the translation:
Specifically to me it all comes down to the definition of "pipeline"
From the clearml perspective:
Manual Task - code that is executed by the user (or any other mechanism Outside of the agent)
Remote Task - code that is executed by the Agent
Pipeline is a Task
Pipeline can be "manual task" but also "remote task"
Pipeline generates "remote tasks"
Task status (e.g. pipeline status as it is also a Task) can be: draft, a...
ClearML maintains a github action that sets up a dummy clearml-server,
You have one, it's the http://app.clear.ml (not a dummy one, but for this purpose it will work)
thoughts ?
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Ohh RotundHedgehog76 this implies a single jupyter hub with multiple uses, is that correct ?
(if this is the case, then yes, clearml-session is definitely not the correct solution, I would look for a helm chart for jupyter hub)
Okay, let me quickly run a test
I want to inject a bash command after the repo has been clone (and maybe even after the venv has been installed).
LazyTurkey38 the created venv inherits from the system environment, so in theory you can do all the installation on the system python and the created venv will just inherit the packages, no?
(btw: just to clarify, there is only one entry point for the custom bash script and that is before everything, so users can configure the container before the agent starts)
GiddyTurkey39 Just making sure, you ran ping IP
not ping ip:port
right ?
GiganticTurtle0
What do you mean by "reuse_last_task_id" ? each component is always a new Task generated (unless it is cached, and then it will reuse the previous executed)
What am I missing here?
GiddyTurkey39 can you ping the server-address
(just making sure, this should be the IP of the server not 'localhost')
Hi ZippySheep23
Any ideas what might be happening?
I think you passed the upload limit (2.36 GB) π
GiddyTurkey39
I would guess your VM cannot access the trains-server
, meaning actual network configuration issue.
What are VM ip and the trains-server IP (the first two numbers are enough, e.g. 10.1.X.Y 174.4.X.Y)
SuperiorDucks36 , is the domain name "rz-s-git" this does not seem like a valid domain?
EDIT:
Is it a local domain on your network?
what do you have in the trains-agent machine in "/etc/host"
Does this require you run the pipeline locally (I see you have set default execution queue) or do any other specific set-up?
Yes this mean the pipeline Logic runs manually/locally (logic means launching components, not actually compute)
Please have a go at it, I'm sure some quirks in the psuedo code are missing but it should work, and I'll gladly help set it up
oh dear ...
ScrawnyLion96 let me check with front-end guys π
When I do the port forward on my own usingΒ
ssh -L
Β also seems to fail for jupyterlab and vscode, too, which i find odd
The only external port exposed is the SSH one 10022, then the client forwards it locally (so you, the user, can always have the same connect i.e. "ssh root@localhost -p 8022")
If you need to expose an additional port , when the clearml-session is running, open another terminal and do:sh root@localhost -p 8022 -L 10123:localhost:6666
This should po...
ShallowCat10 try something similar to this one, due notice that it might take a while to get all the task objects, so I would start with a single one π
`
from trains import Task
tasks = Task.get_tasks(project_name='my_project')
for task in tasks:
scalars = task.get_reported_scalars()
for x, y in zip(scalars['title']['original_series']['x'], scalars['title']['original_series']['y']):
task.get_logger().report_scalar(title='title', series='new_series', value=y, iteration=...
Hi JitteryCoyote63
cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server machine?
It assumes you have an agent connected to the "services" queue π
That said, it also tries to delete the tasks artifacts/models etc, you can see it here:
https://github.com/allegroai/trains/blob/c234837ce2f0f815d3251cde7917ab733b79d223/examples/services/cleanup/cleanup_service.py#L89
The default configuration will assume you are running i...
have a CI/CD (e.g Github Actions) thats update my βproductionβ pipeline on ClearML UI,
I think this is the easiest way, basically the CI/CD launches a pipeline (which under the hood is another type of Task), by querying the latest "Published" pipeline that is also Not archived, then cloning+pushing it to execution queue.
In the UI when you want to "upgrade" the production pipeline you just right click "Publish" on the pipeline you want to launch. Another way is to do the same with Tags...
I have to commit the YAML with my AWS credentials to git.
CleanPigeon16 please do not π
either put them on the Task itself, or as OS env on the machine/agent running the Task.
Regrading where it is stored (I think the default is DevOps project, need to look at the code)
Can the host server's service agent be used?
In theory yes, just make sure you expose the containers network (check the docker compose)
Also btw, is this supposed to be screenshot from community verison
Hmm seems like screenshot from an enterprise version, I'll ask them to update π
I am also not understanding how clearml-serving is doing the version for models in triton.
Basically you have two Tasks, one is the "controller" checking model changes and updating itself.
The other is the engine, checking on the "controller" Task, which models it needs to download/configure and replaces them.
This way you can ha...
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
@<1570220844972511232:profile|ObnoxiousBluewhale25> it creates a new Model here
None
If you want it to log to something other than the default file server create the clearml Task before starting the training:
task = Task.init(..., outout_uri="file:///home/karol/data/")
# now training
It will use the existing Task and upload to the destination folder