Reputation
Badges 1
25 × Eureka!I think I found something, let me test my theory
btw: what's the OS and python version?
okay this points to an issue with the k8s glue, I think it somehow failed to launch the pod. Can you send me the log of the clearml-k8s-glue ?
I'm looking into the savefig issue, meanwhile you can disable the popup by adding at the top of your code the following:import matplotlib matplotlib.rcParams['backend'] = 'agg' import matplotlib.pyplot matplotlib.pyplot.switch_backend('agg')
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
With remote_execution it isΒ
command="[...]"
Β , but on local it isΒ
command='train'
Β like it is supposed to be.
I'm not sure I follow, could you expand ?
Hmm that is odd, could it be you are changing the sys.path ?
(What I'm assuming is happening is that it detects the packages in the PYTHONPATH and for some reason the order is different so it finds the "system" package before the "venv" package, hence the incorrect version)
Hi CrookedWalrus33
When we enqueue the task to run remotely, not all conda packages are installed,
Yes it actually lists all the python packages inside "installed packages" regradless of whether they are coming from pip / conda. Internally it holds the conda part in a separate section (maybe we should present it?!)
and the task is failing (they
Can you provide the log for the Task executed by the agent?
Hi EmbarrassedSpider34
Long story (see below) short, yes you can ignore this warning :)
Specifically, torch is spinning processes and killing them, every process will have a reference to the parent semaphore (for internal clearml bookkeeping), now python is not very good with this kind of thing (and it is getting better on newer python verions), bottom line python "think" someone lost a semaphore, but there reality is that subprocess never created it in the first place. Does that make sen...
EmbarrassedSpider34
Sync_folder and upload
Several times along the code and then
Do notice they overwrite one another...
EmbarrassedSpider34 I can update that an RC should be out later today with a fix π
Hi @<1572395181150310400:profile|DeterminedHare56>
Yes Slack is not the best for knowledge sharing, but it is the easiest for users to communicate over, and it is the easiest to setup and scale.
Specifically you can find historical log of the Slack channel here: None
Which we hoped google will index, but seems like this is still not working as expected, if you have any inputs it will be great to improve it
Hi ThoughtfulBadger56
Just add --stop
to the clearml-agent
(the exact same command as you used to spin it, just add --stop at the end and it will stop it, or just do clearml-agent daemon --stop and it will iteratively close them)
WickedGoat98 what's the clearml version you are using?
Hmmm, are you running inside pycharm, or similar ?
Hmm SuccessfulKoala55 what do you think?
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
and you have clearml v0.17.2 installed on the "system" packages level, and 0.17.5rc6 installed inside the pyenv venv ?
Hi DefeatedCrab47
You should be able to change the Web server port , but API port (8008) cannot be changed. If you can login to the web app and create a project it means everything is okay. Notice that when you configure trains ( trains-init
) the port numbers are correct π
Hi @<1547028052028952576:profile|ExuberantBat52>
task = Task.get_task(...)
print(task.data)
wdyt?
It seems stuck somewhere in the python path... Can you check in runtime what's os.environ['PYTHONPATH']
Hi @<1547028116780617728:profile|TimelyRabbit96>
Trying to do model inference on a video, so first step in
Preprocess
class is to extract frames.
Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
[None](https://github.com/allegro...
oh, then this is user/pass (pass is the same as app key / secret)
None
"erasing" all the packages that had been set in the base task I'm cloning from. I
Set is not add, if you are calling set_packages, you are overwriting all of them with this single call.
You can however do:
task_data = task.export_task()
requirements = task_data["script"]["requirements"]["pip"]
requirements += "new packages"
task.set_packages(requirements)
I guess we should have get_requirements
?!
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
Hi @<1547028116780617728:profile|TimelyRabbit96>
You are absolutely correct, we need to allow to override configuration
The code you want to change is here:
None
You can try:
channel = self._ext_grpc.aio.insecure_channel(triton_server_address, options=dict([('grpc.max_send_message_length', 512 * 1024 * 1024), ('grpc.max_receive_message_len...