FileNotFoundError: [Errno 2] No such file or directory
Could it be the file you are trying to run is not in the repository ?
Are you running inside a docker ?
Any chance you can send the full log ?
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
Hi PanickyMoth78
You mean like another Task? or maybe Slack message?
ContemplativeGoat37
http://1.it seems the DNS resolving to the server fails? (Temporary failure in name resolution) Is this running on an agent, or manually ? "clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###" Is this you manually aborting the Task or is it aborting itslef due to the connectivity ?
4. what's the clearml/clearml-agent versions ?
Fixed in pip install clearml==1.8.1rc0
š
i'm sorry, I mean if the queue name is not provided to the agent , the agent will look for the queue with the "default" tag. If you are specifying the queue name, there is no need to add the tag.
Is it working now?
Different question. How can I pass PYTHONPATH env variable to a task, run by agent (so python can find classes inside m subdirectories)?
Hi HelpfulHare30
By default the working directory will be added to the python path, this means if I have under execution:Working Dir: "." Script: "src/script.py"
The root git repo will be added to the python path.
BTW: next RC you could add a flag to the agent to always add the git repo
So we basically have two options, one is when you call Dataset.get_local_copy()
, we register it on the Task automatically, the other is a more explicit, with something like:ds = Datasset.get(...) folder = ds.get_local_copy() task.connect(ds, name=train) ... ds_val = Datasset.get(...) folder = ds_val.get_local_copy() task.connect(ds_val, name=validate)
wdyt?
Hi DrabCockroach54
I think the Kubernetes integration (k8s glue) is not part of the open-source features, and is only available as enterprise feature š
Hi @<1523702000586330112:profile|FierceHamster54>
I think I'm missing a few details on what is logged, and ref to the git repo?
... indicate the job needs to be run remotely? Iām imagining something like
clearml-task
and you need to specify the queue to push your Task into.
See here: https://clear.ml/docs/latest/docs/apps/clearml_task
I want to use services queue for running services, and I want to do it on k8s
So yes, as a standalone pod with the agent in venv mode (as opposed to docker mode)
Does that make sense to you?
Hi TroubledJellyfish71
What do you have listed on the Task's execution "installed packages" section ? (of the original Task) ?
How did it end up with an http link of pytorch ?
Usually it would be torch==1.11
...
EDIT:
I'm assuming the original Task was executed on a Mac M1, what are you getting when calling pip freeze
?
And where is the agent running ? (and is it venv or docker mode?)
I see, actually what you should do is a fully custom endpoint,
- preprocessing -> doenload video
- processing -> extract frames and send them to Triton with gRPC (see below how)
- post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal `_process...
Okay that means it is running in virtual environment mode.
On the original Task (the one you enqueued) what were the installed packages (specifically the torch/torchvision) ?
Yes, in tandem with the experiments (because they constantly log to the server).
That said, with 0.16 we added offline mode, so you can run in offline mode, then import the experiment into the system.
Nested in the UI is not possible I think?
Yes, but the next version will have nested projects, that's something š
I mean that it is possible to start the subtask while the main task is still active.
You cannot call another Task.init while a main one is running.
But you can call Task.create and log into it, that said the autologging is not supported on the newly created Task.
Maybe the easiest solution is just to do the "sub-tasks" and close them. That means the main Task i...
Thanks BitterStarfish58 !
It doesn't not seem to be related to the upload. The upload itself finished... What's your Trains version?
How so? they are in one place? the creation of the venv is transparent, and the packages that are there are everything you have in the docker, plus the ability to override them from the UI.
What am I missing here ?
GrievingTurkey78 yes, you are correct on both.
Will the sweep functionality work?
Yes it should, that said, it will not use the trains-agent
so you are limited to the machine running the sweep.
If you want to do HPO on multi-node, checkout this example š
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account
I think the easiest and safest way for you is to actually have full control over the AMI, and recreate once from scratch.
Basically any ubuntu/centos + docker and docker-compose should do the trick, wdyt ?
GiganticTurtle0 fix was just pushed to GitHub špip install git+
SoggyFrog26 you'll have it in the next RC š
Not sure what's the plan I know one should be out today/tomorrow, worst case on the next one š