Very Cool!
BTW guys, are you using the task.models[]
to continue from the last checkpoint? or is it task.artifacts[]
?
The latest TAO doesn't use python for fine tuning, rather it uses the CLI entirely
It's a good question, but I think the CLI actually just runs a python code (the CLI is their interface). Generally speaking I'm pretty sure it will not be complicated to convert the TLT integration to support TAO (Nvidia helps with that, and I think we had a similar proces with Nvidia Clara/MONAI)
BTW: how are you using Nvidia TAO ?
UnevenDolphin73 are you positive, is this reproducible? What are you getting?
packages are updated, and I don't know which python version I get, + changing the python version of the OS is not really recommended
Wait I'm confused, this is inside a container, no?
and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)
Generally speaking you are correct, but some packages will not have the same version for all python versions
Specifically in this case I think...
HungryArcticwolf62 transformer model is at the end a pytorch/tf model, with pre/post processing.
the pytorch/tf model inference is done with Triton (probably the most efficient engine today), where clearml runs the pre/post on a different CPU machine (making sure we fully utilize all the HW. Does that answer the question?
Latest docs here:
https://github.com/allegroai/clearml-serving/tree/dev
expect a release after the weekend 😉
I'm running agent inside docker.
So this means venv mode...
Unfortunately, right now I can not attach the logs, I will attach them a little later.
No worries, feel free to DM them if you feel this is to much to post them here
Hi StickyBlackbird93
Yes, this agent version is rather old ( clearml_agent v1.0.0
)
it had a bug where pytorch wheel aaarch broke the agent (by default the agent in docker mode, will use the latest stable version, but not in venv mode)
Basically upgrade to the latest clearml-agent version it should solve the issue:pip3 install -U clearml-agemnt==1.2.3
BTW for future debugging, this is the interesting part of the log (Notice it is looking for the correct pytorch based on the auto de...
Hi @<1523702000586330112:profile|FierceHamster54>
Nope 🙂 nothing to worry about.
That said do notice the open-source file-server is not secure, this does not mean it will spill data on the server, but it does mean that you should probably put it behind a VPN or use S3/GCP/Azure if this is open to the public internet
PompousBeetle71 If this is argparser and the type is defined, the trains-agent will pass the equivalent in the same type, with str
that amounts to '' . make sense ?
Hmm... That's what happens with the exception of None/'' if type is str... There is no way to differentiate in the UI.
This is why we opted for type=str
will "cast" everything to str so you always get str, while not specifying a type will leave the variable as is... If you have an idea on how to support both, feel free to suggest 🙂
i.e. runpip install --upgrade trains
I mean , the python package, not the trains-server version
Hi PompousBeetle71
I remember it was an issue, but it was solved a while ago. Which Trains version are you using?
And the trains version?
PompousBeetle71 Could you check with 0.14.3 that just released?
Hmm, I still wonder what is the "correct" answer for most people, is empty string in argparse redundant anyhow? will someone ever use it?
PompousBeetle71 is this ArgParser argument or a connected dictionary ?
HandsomeCrow5 if you want to edit the Task object you can just use:internal_task_representation = task.data internal_task_representation.execution.script = ... task._edit(execution=internal_task_representation.execution)
This will make sure you do not need to worry about API version etc. the Task object will take care of it.
BTW: it seems a few more people wanted this ability, maybe we should edit a proper .edit method to Task. Thoughts ?
You can already sort and filter experiments based on any hyper parameter or metric that the experiment reports, there is no need for any custom language query. Also all created filter/sorted table can be shared exactly as they are, so you can create leaderboards and share specific filters. You can also use the search bar in order to filter based on experiment name / comment. Tags will be added soon as well 🙂
Example of custom columns is here (the screen grab is a bit old, now there is als...
Hi StickyMonkey98
I'm (again) having trouble with the lack of documentation regarding Task.get_tasks(task_filter={STUFF}).
Yes we really have to add documentation there... Let me add that to the todo list
How do I filter tasks by time started? It seems there's a "started" property, and the web ui uses "started" as a key-word in the url query, but task_filter results in an error when I try that...Is there some other filter keyword for filtering by start-time??
last 10 started ...
Hi BeefyHippopotamus73
. I checked the template task and the list of “Installed Packages” indeed does not have one of my required packages in the list.
Basically the "installed packages" is auto populated based on the directly imported packages n your code base.
Could it be you do not have import snowflake-connector-python
and this is a derivative package (i.e. required from a different package)
BTW: when you clone your Task in the UI you can edit and add the missing packages,...
Hi @<1570220858075516928:profile|SlipperySheep79>
I think this is more complicated than one would expect. But as a rule of thumb, console logs and metrics are the main ones. I hope it helps? Maybe sort by number of iterations in the experiment table ?
BTW: probable better to ask in channel
The -m src.train
is just the entry script for the execution all the rest is be taken care by the Configuration section (whatever you pass after it will be ignored if you are using Argparse as it is auto-connects with ClearML)
Make sense ?
Basically run the 'agentin virtual environment mode JumpyDragonfly13 try this one (notice no --docker flag)
clearml-agent daemon --queue interactive --create-queue Then from the "laptop" try to get a remote session with:
clearml-session `
Our remote machine is Windows 10
JumpyDragonfly13 seems like the Windows 10 + docker is the issue (that would explain the OCI error)
Is this relevant ?
https://github.com/microsoft/WSL/issues/5100
Just making sure I understand, basically same ArgParser support we already have, but for python-fire
(which is the ability to automatically log the arguments, and then change them when executed by trains-agent), correct?
If this is the case, are you familiar with the implementation of python-fire
? What I'm looking for is where exactly the parsing happens, so we could patch it, and log/override values
Could you run your code not from the git repository.
I have a theory, you never actually added the entry point file to the git repo, so the agent never actually installed it, and it just did nothing (it should have reported an error, I'll look into it)
WDYT?
If this doesn't help.
Go to your ~/clearml.conf
file, at the bottom of the file you can add agent.python_binary
and change it to to the location of python3.6 (you can run which python3.6
to get the full path):agent.python_binary: /full/path/to/python3.6
Could you manually configure the ~/trains.conf ?
(Just copy paste the section from the UI)
then try to run:trains-agent list