Reputation
Badges 1
25 × Eureka!Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
Hi RipeGoose2
What exactly is being uploaded ? Are those the actual model weights or intermediate files ?
. So i'd like to use the command line argument it in the first argparse, and then hide/delete/override before running the second argparse.
Nice, hack!
AttributeError: 'PosixPath' object has no attribute 'loc'
SarcasticSquirrel56 I'm assuming the artifacts is pandas and you forgot to either import before or add as requirement for the Task 🙂
This is causing the artifact .get() method to revert to returning the local path to the artifact, instead of actually de-serializing
(We should print a warning though, I'll make sure we do 🙂 )
EDIT: basically clearml failed to realize you also need pandas because it was never imported ...
Notice that the StorageManager has default configuration here:
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L76
Then a per bucket credentials list, with detials:
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L81
Essentially the example provide just prints out ids to the log file,
What do mean?
SmugOx94 could you please open a GitHub issue with this request, otherwise we might forget 🙂
We might also get some feedback from other users
ReassuredTiger98 that is a good point, at the moment they are designed as "machine level" configs, but we do have built in support to allow multiple configurations. The technical issue is we have to read the configuration file before we initial the Task object, that means we still are not aware of the git root (which I assume is where we could put a configuration file)
BTW: regrading the detect_with_conda_freeze we hope that this flag is rarely used, as the Clearml should auto-detect t...
when you say "configuration files" are you referencing the dict in the mock example ?
that embed seems to be slightly off with regards to where the link is actually pointing to
I think this is the Slack preview... 😞
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')Basically there is no "random_state" only "General/random_state"
Make sense ?
Then it initiate a run on aws, which I want it to use the same task-id.
BoredPigeon26 Clone the Task, it basically creates a new copy (of the setup/configuration etc.)/
Then you can launch it on an aws instance (I'm assuming with clearml-agent)
wdyt?
But it write-over the execution tab in the gui
It does you are correct, it will however Not overwrite the reports (log scalars etc)
Hi SmallDeer34
Can you try with the latest RC , I think we fixed something with the jupyter/colab/vscode support!pip install clearml==1.0.3rc1
but when the dependencies are installed, the git creds are not taken in account
I have to admit, we missed that use case 😞
A quick fix will be to use git ssh, which is system wide.
but I want know to switch to git auth using Personal Access Token for security reasons)
Smart move 😉
As for the git repo credentials, we can always add them, when you are using user/pass. I guess that would be the behavior you are expecting, unless the domain is different......
JitteryCoyote63 try to add the prefix to the parameter name, e.g. instead of "artifact_name" use "Args/artifact_name"
Hmm let me check something
Could not find a version that satisfies the requirement pytorch~=1.7.1
Seems like pytorch 1.7.1 has no package for python 3.7 ?
I will create a minimal example.
Many thanks ReassuredTiger98 !
@<1689446563463565312:profile|SmallTurkey79> could you attach the full log of the Task?
also I would recommend "export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" (not true )
Usually binary env vars are 0/1
(I can see that the docs here: None
never mention it, I'll ask them to add that)
Would that go under
arguments
?
yes 🙂
Also what is the base path where the git repo is cloned? So if my repo is called myProject.git, what would the full path be?
For example https://github.com/ <user>/myProject.git
btw: how come you do not have this field auto populated from running the code locally or using clearml-task CLI?
JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week
Hi @<1532532498972545024:profile|LittleReindeer37>
This is truly a great discussion to have. Personally I think the main difference is that software development is a somewhat linear process , and git captures it very well. But ML is a lot wider nonlinear process, which to me means that trying to conform the same workflow into a Dev tree will end up failing. The way ClearML thinks about it (and I think the analogy to source control is correct ) is probably closer to how you think about proj...
My pleasure, and apologies 🙂
or do you mean agent can convert https url to ssh??
Yep it does that automatically if you set: force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25
You can control it with auto_ arguments in the Task.init call
https://clear.ml/docs/latest/docs/references/sdk/task#taskinit
Notice that in your execute_remotely() you did not specify a queue to put the current Task into
What it does is it stops the current running code and it puts the newly created task into the specified queue, if you do not specify a queue , it will just abort it, and wait for you to Manually enqueue it.
To solve it:task.execute_remotely(queue_name='my_queue')
Hi DepressedChimpanzee34 , took me a while but I think there is a solution:
In your docker file, replace:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L5
withentrypoint: /bin/bash command: -c "mkdir -p /var/log/clearml && cd /opt/clearml/ && python3 -m apiserver.apierrors_generator && gunicorn -w 4 -t 600 --bind=0.0.0.0:8008 apiserver.server:app"
It should actually work the same, if you find out it fails to properly register let me know (and then I guess a github issue is the next step)