Reputation
Badges 1
25 × Eureka!PanickyMoth78 ScantMoth28
With several models saved by the training process (whose code is not task-aware)
You can actually specify which models to be saved:task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})
https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
This way you can upload only the model you need.
Okay so my thinking is, on the pipelinecontroller / decorator we will have:abort_all_running_steps_on_failure=False
(if True, on step failing it will abort all running steps and leave)
Then per step / component decorator we will havecontinue_pipeline_on_failure=False
(if True, on step failing, the rest of the pipeline dag will continue)
GiganticTurtle0 wdyt?
Exactly, thatβs my problem: I want to remove it to make sure it is reinstalled (because the version can change)
JitteryCoyote63 yes, this is definitely a pip bug... can you test with the latest pip version, maybe it was fixed? (i.e. git+https:// link)
With env caching enabled, it wonβt reinstall this private dependency, right?
It will, local packages (".") and git packages are alwyas reinstalled even if using venv caching, exactly for that reason π
JitteryCoyote63 you mean? (notice no brackets)task.update_requirements(".")Β
Either pass a text or a list of lines:
The safest would be '\n'.join(all_req_lines)
and when you remove the "." line does it work?
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
I prepared my own image and want use this venv
No worries, it creates a "transparent" venv, it uses everything from the docker (the penalty of create a new venv is negligible π , you end up with the exact same set of packages)
The only weird thing to me is not getting any "connection warnings" if this is indeed a network issue ...
ScantWorm7
Tensorboard is automatically captured and sent to the trains server. This is in addition to the local copy of your TB files. Actually in most cases the local copy is redundant
Hi @<1547028116780617728:profile|TimelyRabbit96>
Trying to do model inference on a video, so first step in
Preprocess
class is to extract frames.
Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
[None](https://github.com/allegro...
MelancholyElk85 notice there is the pipeline controller queue (i.e. which agent will run the logic of the pipeline), and the default queue for the pipeline steps (i.e. the actual steps of the pipeline).
The default queue for the pipeline logic itself is services
. you can change it ( pipeline.start(..., queue='another_q')
)
Make sense ?
ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub π I suggest you get the latest copy of it:curl
-o docker-compose.yml
Since I'm assuming there is no actual task to run, and you do not need to setup the environment (is that correct?)
you can do:$ CLEARML_OFFLINE_MODE=1 python3 my_main.py
wdyt?
It's the same but done from outside, you want the same and "offline" as well right?
oh, if this is the case, why not use the "main" server?
MagnificentPig49 quick update, front-end guys updated me that with the next trains-server update they will have the web client code available on the repository , ETA probably mid May or so :)
That would be great! Might have to useΒ
2>/dev/null
Β in some of my bash scripts
Feel free to test and PR :)
One other question regarding connecting. We have setup sshd inside the docker image we are using.
Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd
inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...
As we use a custom CUDA image, we do not want this running on user login, and get ugly error messages about missing symlinks.
You can customize the startup bash script (running inside Any container) here:
https://github.com/allegroai/clearml-agent/blob/bf07b7f76d3236c1118b81730c6d9718705a795a/docs/clearml.conf#L145
LackadaisicalOtter14 Would that help?
Hi LackadaisicalOtter14
Is it possible to remove this line to stop it from being executed
Everything is possible π II think the main question is why it is there (which ti the best of my understanding, is to solve for any cuda drivers and installed packages, meaning anything that is installed in runtime)
I think we can suppress the error, wdyt?'echo "ldconfig" 2>/dev/null >> /etc/profile && '
Thanks ScantChimpanzee51 !
Let me see what I can find, should be easy enough to fix now π
Hmm, so this is kind of a hack for ClearML AWS autoscaling ?
and every instance is running an agent? or a single Task?
Hi ExcitedCat13
Sure, download the plugin from the git repo (Install instructions in the repo).
Regarding remote debugging, are referring to ssh ?
The plugin itself is designed to make sure that when you work on a remote machine with pycharm clearml will log the local git repo and changes (as the .git folder is not synced to the remote machine)
I'll make sure they get back to you
Are you also adding those metrics to the experiment table as extra columns ?