![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/AgitatedDove14.png)
Reputation
Badges 1
25 × Eureka!the trend step artifact used to keep track the time of the data so we know the expected trend of the input data. For example, on the first data which is trend_step = 1 the trend value is 10, then if the trend_step = 10 (the tenth data) our regressor will predict the trend value of the selected trend_step. this method is still in research to make it more efficient so it doesn't need to upload artifact every request
Make sense! I would suggest you add a GitHub issue with feature request ...
That would be great! Might have to useΒ
2>/dev/null
Β in some of my bash scripts
Feel free to test and PR :)
One other question regarding connecting. We have setup sshd inside the docker image we are using.
Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd
inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...
Wait ResponsiveHedgehong88 I'm confused, if you integrated your code with clearml, didn't you to run it manually even once (on any machine, local/remote)?
when I duplicate the experiment and clone it remote, the call is ignored and the recorded values are used?
Yes ScantChimpanzee51 exactly.
Think of it as the inital value you want to put on the Task when you are running the code on your machine, later when you clone the Task, you can edit the base docker image in the UI (or with the API), of course the new value is used when the agent spins this Task, and to avoid the actual docker (the one you changed in the UI) to be overwritten by ...
BTW: any specific reason for going the RestAPI way and not using the python SDK ?
I mean, can you install it with something like ?pip install git+
Basically the agent will install main repository, and any git submodules. But it cannot install multiple repositories, as the directory structure might be too much.
wdyt?
Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...
Hi GrittyKangaroo27
Is it possible to import user-defined modules when wrapping tasks/steps with functions and decorators?
Sure, any package (local included) can be imported, and will be automatically listed in the "installed packages" section of the pipeline component Task
(This of course assumes that on a remote machine you could do the "pip install <package")
Make sense ?
FrustratingWalrus87 Unfortunately TB's TSNE is not automatically captured by ClearML (Scalars, histograms etc. are)
That said, matplotlib will be automatically captured do you can run your own PCA/tSNE and use matplotlib to visualize (ClearML will capture it).
The same applies for plotly.
What do you think?
Hi @<1569496075083976704:profile|SweetShells3>
These environment variable are injected into the new process, are you passing them on the vault?
None
CooperativeSealion8 let me know if you managed to solve the issue, also feel free to send the entire trains-server log. I'm assuming one of the dockers failed to boot...
As I installed ClearML using pip,
Where is the clearml-serving runs ? usually your configuration file is in ~/clearml.conf
Notice if it is not there it means it is using the defaults so just create a new one and add that line
That is quite neat! You can also put a soft link from the main repo to the submodule for better visibility
Because submodules inside a git are basically a requirement for a git repo to run. Skipping over a few or selecting manually will break the agent. That said maybe shallow clone might be easier or faster. Regardless it should be an environment passed per Task. Feel free to add a GH issue request, if this is not a unique edge case we will add it
no, i just commented it and it worked fine
Yeah, we should add a comment saying "optional" because it looks as if you need to have it there if you are using Azure.
IrritableJellyfish76 if this is the case, my question is what is the reason to use Kubeflow? (jupyterLab server spinning is a good answer for example, pipelines are to my opinion a lot less)
Hi @<1533620191232004096:profile|NuttyLobster9>
First nice workaround!
Second could you send the full log? When the venv is skipped then pytorch resolving should be skipped as well, and no error should be raised...
And Lastly could you also send the log of the task that executed correctly (the one you cloned), because you are correct it should have been the same
What we would like ideally, is a system where development, training, and deployment are almost one and the same thing, to reduce the lead time from development code to production models.
This is very aligned with the goals of ClearML π
I would to understand more on what is currently missing in ClearML so we can better support this approach
my inexperience in using them a lot until recently. I can see how that is a better solution
I think I failed in explaining my self, I me...
it is a pickle issue
βpackage model doesnβt existβ
Sounds like it, why do you think clearml
has anything there ?
BTW:
import_bind
.
__patched_import3
this is just so when packages that clearml autoconnects with are patched if imported After Task.init was called.
I see it's a plotly plot, even though I report a matplotlib one
ClearML tries to convert matplotlib into plotly objects so they are interactive, it it fails it falls back into a static image as in matplotlib
We use an empty queue to enqueue our tasks in, just to trigger the scheduler
it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
π
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Is the trigger controller running on the services queue ?
Hi @<1657918706052763648:profile|SillyRobin38>
You should either disable certificate verification or add the self-signed certificate to your urllib
None
or set
export REQUESTS_CA_BUNDLE="/path/to/cert/file"
export SSL_CERT_FILE="/path/to/cert/file"
I think the part that is missing for me is the context, in other words how would one configure the execution_plan
and why would they configure it in a specific way?
My intuition, without fully understanding it, is that for some reason the internal DAG/decision is exposed to the user, and it feels like too much information. Basically I have a hunch that the users should not need to have such deep understanding to control the flow, and they should end up with an abstraction on top of it. ...
then will clearml associate that image with my experiment and always use that image with it,
when you say "agent to use my docker image," I'm assuming you mean the configuration file or --docker
argument, in both cases this means Default conatiner.
This means that if the Task does Not specify a docker, it will use the one you set in the conf/argument, But Tasks can always specify a different docker to use, and the agent will pull the requested docker based on the Task's entry.
Eve...
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
I'm pretty sure there is a nice way, let me check soemthing
From the top
trains-agent pulls a service Task Task marked as running- trains-agent worker points to the Task Docker is spinned up environment is installed inside docker (results are shown in the service Task Log) trains-agent inside the docker is launched and a new node appears in the system <host_agent_name>:service:<task_id> and the Task service is listed as running on it main trains-agent is back to idle and its worker now has no experiment listed as running
Where do you think it breaks?
CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?